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INTPoyg and exons of the cy stic vt^ h btb gene 
AND MTTATIoys *t verioob posi t ions of th e, ftfflT 

FIELD OF THE INVENTION 

The .present invention relates generally to the 
cystic fibrosis (CF) gene, and, more particularly to the 
identification, isolation and cloning of the DNA sequence 
corresponding to mutants of the CF gene, as well as their 
transcripts, gene products and genetic information at 
exon/intron boundaries. The present invention also 
relates to methods of screening for and detection of CF 
carriers, CF diagnosis, prenatal CF screening and 
diagnosis, and gene therapy utilizing recombinant 
technologies and drug therapy using the information 
derived from the DNA, protein, and the metabolic function 
15 of the protein. 

BACKGROUND O y THE INVENTION 

Cystic fibrosis (CF) is the most common severe 
autosomal recessive genetic disorder in the Caucasian 
population. It affects approximately l in 2000 live 
births in North America [Boat et al, The M e t,hnHr 
of Inherited , pjsenpfi , 6th ed, pp 2649-2680, McGraw Hill, 
NY (1989)]. Approximately 1 in 20 persons are carriers of 
the disease. 

Although the disease was first described in the late 
1930' s, the basic defect remains unknown. The major 
symptoms of cystic fibrosis include chronic pulmonary 
disease, pancreatic exocrine insufficiency, and elevated 
sweat electrolyte levels. The symptoms are consistent 
with cystic fibrosis being an exocrine disorder. 
Although recent advances have been made in the analysis 
of ion transport across the apical membrane of the 
epithelium of CF patient cells, it is not clear that the 
abnormal regulation of chloride channels represents the 
primary defect in the disease. Given the lack of 
understanding of the molecular mechanism of the disease, 
an alternative approach has therefore been taken in an 
attempt to understand the nature of the molecular defect 
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through direct cloning of the responsible gene on the 
basis of its chromosomal location. 

However, there is no clear phenotype that directs an 
approach to the exact nature of the genetic basis of the 
5 disease, or that allows for an identification of the 
cystic fibrosis gene. The nature of the CF defect in 
relation to the population genetics data has not been 
readily apparent. Both the prevalence of the disease and 
the clinical heterogeneity have been explained by several 
10 different mechanisms: high mutation rate, 

heterozygote advantage, genetic drift, multiple loci, and 
reproductive compensation. 

Many of the hypotheses can not be tested due to the 
lack of knowledge of the" basic defect. Therefore, 
L5 alternative approaches to the determination and 

characterization of the CF gene have focused on an 
attempt to identify the location of the gene by genetic 
analysis. 

Linkage analysis of the CF gene to antigenic and 
protein markers was attempted in the 1950 's, but no 
positive results were obtained (Steinberg et al An. j. 
Hum, genet. , p .- 162-176, (1956); Steinberg and Morton A*k 
J, Hum, Qmof 8: 177-189, (1956); Goodchild et al J. Med. 

genet t 7: 417-419, 1976. 

More recently, it has become possible to use RFLP's 
to facilitate linkage analysis. The first linkage of an 
RFLP marker to the CF gene was disclosed in 1985 (Tsui et 
al. Ssienez 230: 1054-1057, 1985] in which linkage was 
found between the CF gene and an uncharacterized marker 
30 D0CRI-917. The association was found in an analysis of 
39 families with affected CF children. This showed that 
although the chromosomal location had not been 
established, the location of the disease gene had been 
narrowed to about 1% of the human genome, or about 30 
35 million nucleotide base pairs. 

The chromosomal location of the DOCRI-917 probe was 
established using rodent-human hybrid cell lines 



20 



25 



- 'WO 91/10734 " - 

PCT/CA9 1/00009 

3 

=ont,i Aing different human chromosome complements. It 
was shown that D0C R1 - 91 , (aa therefore the cr gene, * 
to human chromosome 7. ' P 

5 B ursue F r ther PhySiCal Md 9enetlC link ">* ""«•• »ere 
pursued m an attempt to pinpoint the location of the CF 

u^), ITT 9 et ai [to ^~*n^ 2a ..„r 

hvb!II t ^ ° f »°»atic cell 

ete t th°e°cT ln " 
with lt . This publication shows that the CF gene can b. 
assigned to either the distal region of band £ 2 or the 
proximal region of band ,31 on chromosome , " 

tl^T™" " ^ """^ 43 ' ««-««• 

<l»«e), gxve a detailed discussion of the isolation of 

IS many new 7,3! probes. The approach outlined led to the 

close to each other. Pulsed field gel electrophoresis 
»pping indicates that these two RFLP markers are bt 
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A major difficulty in identifying the CF gene has 
been the lack of cytologically detectable chromosome 
rearrangements or deletions, which greatly facilitated 
all previous successes in the cloning of human disease 
genes by knowledge of map position. 

Such rearrangements and deletions could be observed 
cytologically and as a result, a physical location on a 
particular chromosome could be correlated with the 
particular disease. Further, this cytological location 
could be correlated with a molecular location based on 
known relationship between publicly available DNA probes 
and cytologically visible alterations in the chromosomes 
Knowledge of the molecular location of the gene for a 
particular disease would allow cloning and sequencing of 
that gene by routine procedures, particularly when the 
gene product is known and cloning success can be 
confirmed by immunoassay of expression products of the 
cloned genes. 

In contrast, neither the cytological location nor 
the gene product of the gene for cystic fibrosis was 
known in the prior art. with the recent identification 
of MET and D7S8, markers which flanked the CF gene but 
did not pinpoint its molecular location, the present 
inventors devised various novel gene cloning strategies 
to approach the CF gene in accordance with the present 
invention. The methods employed in these strategies 
include chromosome jumping from the flanking markers 
cloning of DNA fragments from a defined physical region 
with the use of pulsed field gel electrophoresis, a 
combination of somatic cell hybrid and molecular cloning 
techniques designed to isolate DNA fragments from 
undennethylated CpG islands near CF, chromosome 
microdissection and cloning, and saturation cloning of a 
large number of DNA markers from the 7q3l region. By 
means of these novel strategies, the present inventors 
were able to identify the gene responsible for cystic 
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fibr °sis where the prior art was uncertain or, even in 
one case, wrong. 

The application of these genetic and molecular 
cloning strategies has allowed the isolation and cDNA 
5 cloning of the cystic fibrosis gene on the basis of its 
chromosomal location, without the benefit of genomic 
rearrangements to point the way. The identification of 
the normal and mutant forms of the CF gene and gene 
products has allowed for the development of screening and 
10 diagnostic tests for CF utilizing nucleic acid probes and 
antibodies to the gene product. Through interaction with 
the defective gene product and the pathway in which this 
gene product is involved, therapy through normal gene 
product supplementation and gene manipulation and 
15 delivery are now made possible. 

The gene involved in the cystic fibrosis disease 
process, hereinafter the "CF gene" and its functional 
equivalents, has been identified, isolated and cDNA 
cloned, and its transcripts and gene products identified 
20 and sequenced. A three base pair deletion leading to the 
omission of a phenylalanine residue in the gene product 
has been determined to correspond to the mutations of the 
CF gene in approximately 70% of the patients affected 
with CF, with different mutations involved in most if not 
25 all the remaining cases. This subject matter is 

disclosed in co-pending United states patent application 
S.N. 396,894 filed August 22, 1989 and its related 
continuation-in-part applications s.N. 399,945 filed 
August 24, 1989 and S.N. 401,609 filed August 31, 1989 
30 gPMMARY OP THE Tmr^jfsy 

According to this invention, other base pair 
deletions or alterations leading to the omission of amino 
acid residues in the gene product have been determined. 
According to this invention other nucleotide deletions or 
alterations leading to mutations in the DNA sequence 
resulting in frameshift or splice mutations have been 
determined. 
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With the identification and sequencing of the mutant 
gene and its gene product, nucleic acid probes and 
antibodies raised to the mutant gene product can be used 
in a variety of hybridization and immunological assays to 
5 screen for and detect the presence of either the 

defective CF gene or gene product. Assay kits for such 
screening and diagnosis can also be provided. The 
genetic information derived from the intron/exon 
boundaries is also very useful in various screening and 
10 diagnosis procedures. 

Patient therapy through supplementation with the 
normal gene product, whose production can be amplified 
using genetic and recombinant techniques, or its 
functional equivalent, is now also possible. Correction 
15 or modification of the defective gene product through 
drug treatment means is now possible. In addition, 
cystic fibrosis can be cured or controlled through gene 
therapy by correcting the gene defect in situ or using 
recombinant or other vehicles to deliver a DNA sequence 
20 capable of expression of the normal gene product to the 
cells of the patient* 

According to another aspect of the invention, a 
purified mutant CF gene comprises a DNA sequence encoding 
an amino acid sequence for a protein where the protein, 
25 when expressed in cells of the human body, is associated 
with altered cell function which correlates with the 
genetic disease cystic fibrosis. 

According to another aspect of the invention, a 
purified RNA molecule comprises an RNA sequence 
30 corresponding to the above DNA sequence. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
above DNA sequence. 

According to another aspect of the ,*_r.v3nt? cr.; a DNA 
35 molecule comprises a DNA sequence encoding mutant CFTR 
polypeptide having the sequence according to the 
following Figure 1 for amino acid residue positions 1 to 
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3659. 



1480 as further characterized by « nucleotide sequence 
variants 'resulting in deletion or. alteration of amino 
acxds or residue positions 85,. 148. 178, 455 , 493, 507 
542, 549, 551, 560, 563, 574, 1077 and 1092. 

According to another aspect of the invention, a DNA 
molecule comprises an intronless DNA sequence encoding a 
mutant CFTR polypeptide having the sequence according to 
Figure 1 for DNA sequence positions l to 4575 and 
further characterized by nucleotide sequence variants 
resulting in deletion or alteration of DNA at DNA 
sequence positions 129, 556, 621+1, 7li + i, 1717-1 and 
3659. 

According to another aspect of the invention, a DNA 
molecule comprises a cDNA molecule corresponding to the 
15 above DNA sequence. 

According to another aspect of the invention the 
cDNA molecule comprises a DNA sequence selected from the 
group consisting of: 

20 dnr 8egUenCeS Which correspond to the mutant 

DNA sequence selected from the group of mutant amino acid 
positions of 85, 148, 178, 455, 493, 507, 542, 549, 551 
560 563, 574, 1077 and 1092 and mutant DNA seqnece ' 
positions 129, 556, 621 + i, 71l + i f 1717-1 and 3659 and 
whxch encode, on expression, for mutant CFTR polypeptide; 

(b) DNA sequences which correspond to a fragment of 
the selected mutant DNA sequence, including at least 
twenty nucleotides; 

(c) DNA sequences which comprise at least twenty 
nucleotides and encode a fragment of the selected mutant 

0 CFTR protein amino acid sequence; 

(d) DNA sequences encoding an epitope encoded by at 
least eighteen sequential nucleotides in the selected 
mutant DNA sequence. 

According to another aspect of the invention, a DNA 
> sequence selected from the group consisting of: 
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(a) DNA sequences which correspond to portions of 
DNA sequences of boundaries of exons/introns of the 
genomic CF gene; 

(b) DNA sequences of at least eighteen sequential 

5 nucleotides at boundaries of exons/introns of the genomic 
CF gene depicted in Figure 18; and 

(c) DNA sequences of at least eighteen sequential 
nucleotides of intron portions of the genomic CF gene of 
Figure 18. 

10 According to another aspect of the invention, a 

purified nucleic acid probe comprises a DNA or RNA 
nucleotide sequence corresponding to the above noted 
selected DNA sequences of groups (a) to (c) . 

According to another aspect of the invention, 
15 purified RNA molecule comprising RNA sequence corres- 
ponds to the mutant DNA sequence selected from the group 
of mutant protein positions consisting of 85, 148, 178, 
455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and ' 
1092 and of mutant DNA sequence positions consisting of 
129, 556, 621+1, 711+1, 1717-1 and 3659. 

A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the mutant 
sequences of the above recited group. 

According to another aspect of the invention, a 
recombinant cloning vector comprising the DNA sequences 
of the mutant DNA and fragments thereof selected .from the 
group of mutant protein positions consisting of 85, 148, 
178, 4S5, 493, 507, 542, 549, 551, 563, 574, 1077 and 
1092 and selected from the group of mutant DNA sequence 
positions consisting of 129, 556, 621+1, 711+1, 1717-1 
and 3659 is provided. The vector, according to an aspect 
of this invention, is operatively linked to an expression 
control sequence in the recombinant DNA molecule so that 
the selected mutant DNA sequences for the mutant CFTR 
polypeptide can be expressed. The expression control 
sequence is selected from the group consisting of 
sequences that control the expression of genes of 
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prokaryotic or eukaryotic cells and their viruses and 
combinations thereof . 

According to another aspect of the invention, a 
method for producing a mutant CFTR polypeptide comprises 
the steps of: 

(a) culturing a host cell transfected with the 
recombinant vector for the mutant DNA sequence in a 
medium and under conditions favorable for expression of 
the mutant CFTR polypeptide selected from the group of 
mutant CFTR polypeptides at mutant protein positions 85, 
148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 
1077 and 1092 and mutant DNA sequence positions 129, 556, 
621+1, 711+1 1717-1 and- 3659; and ' " 

(b) isolating the expressed mutant CFTR 
15 polypeptide. 

According to another aspect of the invention, a 
purified protein of human cell membrane origin comprises 
an amino acid sequence encoded by the mutant DNA 
sequences selected from the group of mutant protein 
positions of 85, 148, 178, 455, 493, 507, 542, 549, 551, 
560, 563, 574, 1077 and 1092 and from the group of mutant 
DNA sequence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659 where the protein, when present in human cell 
membrane, is associated with cell function which causes 
25 the genetic disease cystic fibrosis. 

According to another aspect of the invention, a 
method is provided for screening a subject to determine 
if the subject is a CF carrier or a CF patient comprising 
the steps of providing a biological sample of the subject 
to be screened and providing an assay for detecting in 
the biological sample, the presence of at least a member 
from the group consisting of: 

(a) mutant CF gene selected from the group of 
mutant protein positions 85, 148, 178, 455, 
493, 507, 542, 549, 551, 560, 563, 574, 1077 
and 1092 and from the group of mutant DNA 
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sequence positions 129, 556, 621+1, 711+1, 
1717-1 and 3659; 

(b) mutant CF gene products and mixtures thereof; 

(c) DNA sequences which correspond to portions of 
5 DNA sequences. of boundaries of exons/introns of the 

genomic CF gene; 

(d) DNA sequences of at least eighteen sequential 
nucleotides at boundaries of exons/introns of the genomic 
CF gene depicted in Figure 18; and 

0 (e) DNA sequences of at least eighteen sequential 

nucleotides of intron portions of the genomic CF gene of 
Figure 18 . 

According to another aspect of the invention, a kit 
for assaying for the presence of a CF gene by immunoassay 
techniques comprises: 

(a) an antibody which specifically binds to a gene 
product of the mutant DNA sequence selected from the 
group of mutant protein positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092 and from 
the group of mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

According to another aspect of the invention, a kit 
for assaying for the presence of a mutant CF gene by 
hybridization technique comprises: 

(a) an oligonucleotide probe which specifically 
binds to the mutant CF gene having a mutation at a 
protein position selected from the group consisting of 
85, 148, 178, 455, 493, 507, 542, 549, 551, 560, 563, 
574, 1077 and 1092 or having a mutation at a DNA sequence 
position selected from the group consisting of 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to the mutant CF gene; and 
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(c) the probe and reagent means each being present 
in amounts effective to perform the hybridization assay. 

According to another aspect of the invention, an 
animal comprises an heterologous cell system. The cell 
5 system includes a recombinant cloning vector which 

includes the recombinant DNA sequence corresponding to 
the mutant DNA sequence which induces cystic fibrosis 
symptoms in the animal. 

According to another aspect of the invention , in a 
10 polymerase chain reaction to amplify a selected exon of a 
cDNA sequence of Figure 1, the use of oligonucleotide 
primers from intron portions near the 5' and 3' 
boundaries of the selected exon of Figure 18. 
BRIEF DESCRIPTION OF THE DRAWINGS 
15 Figure 1 is the nucleotide sequence of the CF gene 

and the amino acid sequence of the CFTR protein amino 
acid sequence with t indicating mutations at the 507 and 
508 protein positions. 

Figure 2 is a restriction map of the CF gene and the 
20 schematic strategy used to chromosome walk and jump to 
the gene . 

Figure 3 depicts the physical map of the region 
including and surrounding the CF gene generated by pulsed 
field gen electrophoresis. Panels A, B, C, and D show 

25 hybridization data for the restriction enzymes Sal I, Xho 
I, Sfi I, and Nae I, respectively generated by 
representative genomic and cDNA probes which span the 
region. The deduced physical maps for each restriction 
enzyme is shown below each panel. A composite map of the 

30 entire MET-D7S8 interval is shown in panel E (J.M. 

Rommens et al., Am. J. Hum. Genet. 45:932*941, 1990). 
The open boxed segment indicates the portion cloned by 
chromosome walking and jumping, and the filled arrow 
indicates the portion covered by the CF transcript. 

35 Figures 4A, 4B and 4C show the detection of 

conserved nucleotide sequences by cross-species 
hybridization. 



z. . ; : : /~: wo ' 1 / 10734 ' : " :; : :~zz .: z\ :. : _~ :r:izi:r:zp^cA9i/(k)ooc 

: 12 

; Figure 4D is a restriction map of overlapping 
segments of probes E4.3 and HI. 6. 

-Figure 5 is an RNA blot hybridization analysis using 
genomic and cDNA probes. Hybridization to RNA of: a- 
5 fibroblast with cDNA probe G-2; B-trachea (from 

unaf flicted and CF patient individuals) , pancreas, liver 
HL60 cell line and brain with genomic probe CF16; C-T84 
cell line with cDNA probe 10-1. 

Figure 6 is the methylation status of the E4.3 
10 cloned region at the 5' end of the CF gene. 

Figure 7 is a restriction map of the CFTR cDNA 
showing alignment of the cDNA to the genomic DNA 
fragments. 

Figure 8 is an RNA gel blot analysis depicting 
15 hybridization by a portion of the CFTR cDNA (clone 10-1) 
to a 6.5 kb mRNA transcript in various human tissues. 

Figure 9 is a DN A blot hybridization analysis 
depicting hybridization by the CFTR cDNA clones to 
genomic DMA digested with EcoRI and Hind III. 
«•**«■»•*■<-« 20 Figure 10 is a primer extension experiment 

characterizing the 5' and 3' ends of the CFTR cDNA. 

Figure n is a hydropathy profile and shows 
predicted secondary structures of CFTR. 

Figure 12 is a dot matrix analysis of internal 
homologies in the predicted CFTR polypeptide. 

Figure 13 is a schematic model of the predicted CFTR 
protein. 

Figure 14 is a schematic diagram of the restriction 
fragment length polymorphisms (RFLP's) closely linked to 
the CF gene where the inverted triangle indicates the 
location of the F508 3 base pair deletion. 

Figure 15 represents alignment of the most conserved 
segments of the extended NBFs of CFTR with comparable 
regions of other proteins. 
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Figure 16 is the DNA sequence around the F508 
deletion. 
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Figure 17 is a representation of the nucleotide 
sequencing gel showing the DNA sequence at the F508 
deletion. 

Figure 18 is the nucleotide sequence of the portions 
5 of introns and complete exons of the genomic CF gene for 
27 exons identified and numbered sequentially as 1 
through 24 with additional exons 6a, 6b, 14a, 14b and 
17a, 17b of, cDNA sequence of Figure 1; 

Figure 19 shows the results of amplification of 
10 genomic DNA using intron oligonucleotides bounding exon 
10; 

Figure 20 shows the separation by gel 
electrophoresis of the amplified genomic DNA products of 
a CF family; and 
15 Figure 21 is a restriction mapping of cloned intron 

and exon portions of genomic DNA which introns and exons 
are identified in Figure 18. 

DETAILED DESCRIPTION OF THE PREFERR ED EMBODIMENTS 

1^ PEriWITIWg 

20 In order to facilitate review of the various 

embodiments of the invention and an understanding of 
various elements and constituents used in making the 
invention and using same, the following definition of 
terms used in the invention description is as follows: 

25 CF - cystic fibrosis 

CF carrier - a person in apparent health whose 
chromosomes contain a mutant CF gene that may be 
transmitted to that person's offspring. 

CF patient - a person who carries a mutant CF gene 

30 on each chromosome, such that they exhibit the clinical 
symptoms of cystic fibrosis, 

CF gene - the gene whose mutant forms are associated 
with the disease cystic fibrosis. This definition is 
understood to include the various sequence polymorphisms 

3 5 that exist, wherein nucleotide substitutions in the gene 
sequence do not affect the essential function of the gene 
product. This term primarily relates to an isolated 
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coding sequence, but can also include some or all of the 
flanking regulatory elements and/or introns. 

Genomic CF gene - the CF gene which includes 
flanking regulatory elements and/or introns at boundaries 
of exons of the CF gene. 

CF - PI - cystic fibrosis pancreatic insufficient, 
the major clinical subgroup of cystic fibrosis patients, 
characterized by insufficient pancreatic exocrine 
function. 

CF — PS - cystic fibrosis pancreatic sufficient, a 
clinical subgroup of cystic fibrosis patients with 
sufficient pancreatic exocrine function for normal 
digestion of food. 

CFTR - cystic fibrosis transmembrane conductance 
15 regulator protein, encoded by the CF gene. This 

definition includes the protein as isolated from human or 
animal sources, as produced by recombinant organisms, and 
as chemically or enzymatically synthesized. This 
definition is understood to include the various 
polymorphic forms of the protein wherein amino acid 
substitutions in the variable regions of the sequence 
does not affect the essential functioning of the protein, 
or its hydropathic profile or secondary or tertiary 
structure. 

DNA - standard nomenclature is used to identify the 
bases. 

Intronless DNA - a piece of DNA lacking internal 
non-coding segments, for example, cDNA. 

IRP locus sequence - (protooncogene int-1 related), 
a gene located near the CF gene. 

Mutant CFTR - a protein that is highly analagous to 
CFTR in terms of primary, secondary, and tertiary 
structure, but wherein a small number of amino acid 
substitutions and/or deletions and/or insertions result 
35 in impairment of its essential function, so that 

organisms whose epithelial cells express mutant CFTR 
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rather than CFTR demonstrate the symptoms of cystic 
fibrosis. 1 

mCF - a mouse giene orthologous to the human CF gene 
NBFs - nucleotide (ATP) binding folds 
5 ORF - open reading frame 

PCR - polymerase chain reaction 

Protein - standard single letter nomenclature is 
used to identify the amino acids 

R-domain - a highly charged cytoplasmic domain of 
10 the CFTR protein 

RSV - Rous Sarcoma Virus 
SAP - surfactant protein 

RFLP - restriction fragment length polymorphism 
507 mutant CF gene - the CF gene which includes a 
15 DNA base pair mutation at the 506 or 507 protein position 
of the cDNA of the CF gene 

507 mutant DNA sequence - equivalent meaning to the 
507 mutant CF gene 

507 mutant CFTR protein or mutant CFTR protein amino 
20 acid sequence, or mutant CFTR polypeptide - the mutant 

CFTR protein wherein an amino acid deletion occurs at the 
isoleucine 506 or 507 protein position of the CFTR. 

Protein position means amino acid residue position. 
jBomnya THE CP ogre 
25 Using chromosome walking, jumping, and cDNA 

hybridization, DNA sequences encompassing > 500 kilobase 
pairs (kb) have been isolated from a region on the long 
arm of human chromosome 7 containing the cystic fibrosis 
(CF) gene. This technique is disclosed in detail in the 
30 aforemention co-pending United States patent 
applications. For purposes of convenience in 
understanding and isolating the CF gene and identifying 
other mutations, such as at the 85, 148, 1178, 455, 493, 
507, 542, 549, 560, 563, 574, 1077 and 1092 amino acid 
35 residue positions, the technique is reiterated here. 

Several transcribed sequences and conserved segments have 
been identified in this region. One of these corresponds 
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to the CF gene and spans approximately 250 kb of genomic 
^ DNA. overlapping complementary DNA (cDNA) clones have 

been isolated from epithelial cell libraries with a 
genomic DNA segment containing a portion of the cystic 
5 fibrosis gene. The nucleotide sequence of the isolated 
cDNA is shown in Figures 1 through 18. In each row of 
the respective sequences the lower row is a list by 
standard nomenclature of the nucleotide sequence. The 
upper row in each respective row of sequences is standard 
10 single letter nomenclature for the amino acid 
corresponding to the respective codon. 

Accordingly, the isolation of the CF gene provided a 
cDNA molecule comprising a DNA sequence selected from the 
group consisting of: 
15 (a) DNA sequences which correspond to the DNA 

sequence of Figure 1 from amino acid residue position 1 
to position 1480; 

(b) DNA sequences encoding normal CFTR polypeptide 
having the sequence according to Figure 1 for amino acid 

^i*-H*^^ 20 residue positions from 1 to 1480; 

(c) DNA sequences which correspond to a fragment of 
the sequence of Figure 1 including at least 16 sequential 
nucleotides between amino acid residue positions 1 and 
1480; 

25 (d) DNA sequences which comprise at least 16 

nucleotides and encode a fragment of the amino acid 
sequence of Figure 1; and 

(e) DNA sequences encoding an epitope encoded by at 
least 18 sequential nucleotides in the sequence of Figure 

30 1 between amino acid residue positions 1 and 1480. 

According to this invention, the isolation of other 
mutations in the CF gene also provides a cDNA molecule 
comprising a DNA sequence selected from the group 
consisting of: 

35 a) DNA sequences which correspond to the DNA 

sequence encoding mutant CFTR polypeptide characterized 
by cystic f ibrosis-associated activity in human 
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epithelial cells, or the DNA sequence of Figure 1 for the 
amino acid residue positions 1 to 1480 yet further 
characterized by a base pair mutation which results in 
the deletion of or a change for an amino acid at residue 
5 positions 85, 148, 178, 455, 493, 507, 542, 549, 551, 
560, 563, 574, 1077 and 1092; 

b) DNA sequences which correspond to fragments of 
the mutant portion of the sequence of paragraph a) and 
which include at least sixteen nucleotides; 
10 c) DNA sequences which comprise at least sixteen 

nucleotides and encode a fragment of the amino acid 
sequence encoded for by the mutant portion of the DNA 
sequence of paragraph *a) ; and 

d) DNA sequences encoding an epitope encoded by at 
15 least 18 sequential nucleotides in the mutant portion of 
the sequence of the DNA of paragraph a) . 

Transcripts of approximately 6,500 nucleotides in 
size are detectable in tissues affected in patients with 
CF. Based upon the isolated nucleotide sequence, the 
20 predicted protein consists of two similar regions, each 
containing a first domain having properties consistent 
with membrane association and a second domain believed to 
be involved in ATP binding. 

A 3 bp deletion which results in the omission of a 
25 phenylalanine residue at the center of the first 

predicted nucleotide binding domain (amino acid position 
508 of the CF gene product) was detected in CF patients. 
This mutation in the normal DNA sequence of Figure 1 
corresponds to approximately 70% of the mutations in 
30 cystic fibrosis patients. Extended haplotype data based 
on DNA markers closely linked to the putative disease 
gene suggest that the remainder of the CF mutant gene 
pool consists of multiple, different mutations. This is 
now exemplified by this invention at, for example, the 
35 506 or 507 protein position. A small set of these latter 
mutant alleles (approximately 8%) may confer residual 
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pancreatic exocrine function in a subgroup of patients 
who are pancreatic sufficient. 

CHROMOSOME WALKING AKD JUMPING 

Large amounts of the DNA surrounding the D7S122 and 
D75340 linkage regions of Rommens et al supra were 
searched for candidate gene sequences. In addition to 
conventional chromosome walking methods, chromosome 
jumping techniques were employed to accelerate the search 
process. From each jump endpoint a new bidirectional 
walk could be initiated. Sequential walks halted by 
"unclonable" regions often encountered in the mammalian 
genome could be circumvented by chromosome jumping. 

The chromosome jumping library used has been" 
described previously [Collins et al, Science 235, 1046 
(1987); Ianuzzi et al, Am. J. Hum. Genet. 44, €95 
(1989)]. The original library was prepared from a 
preparative pulsed field gel, and was intended to contain 
partial EcoRl fragments of 70 - 130 kb; subsequent 
experience with this library indicates that smaller 
fragments were also represented, and jumpsizes of 25 - 
110 kb have been found. The library was plated on sup- 
host MC1061 and screened by standard techniques, 
[Maniatis et al]. Positive clones were subcloned into 
pBRA23Ava and the beginning and end of the jump 
identified by EcoRl and Ava 1 digestion, as described in 
Collins, genpffi? gpfllygjs: A practic al approach (IRL, 
London, 1988), pp. 73-94) . For each clone, a fragment 
from the end of the jump was checked to confirm its 
location on chromosome 7 . The contiguous chromosome 
region covered by chromosome walking and jumping was 
about 250 kb. Direction of the jumps was biased by 
careful choice of probes, as described by Collins et al 
and Ianuzzi et al, supra . The entire region cloned, 
including the sequences isolated with the use of the CF 
gene cDNA, is approximately 500 kb. 

The schematic representation of the chromosome 
walking and jumping strategy is illustrated in Figure 2. 
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As the two independently isolated DNA markers 
D7S122 (pH131) and D7S340 (TMS8) , were only 
approximately 10 kb apart (Figure 2), the walks and jumps 
were essentially initiated from a single point. The 

D7Twas 0n t h 0f Walkin9 30(1 jUaPing With r66PeCt *> MET «* 
D7S8 was then established with the crossing of several 

rare-cutting restriction endonuclease recognition sites 

and ill th ° Se ^ X ' 1 NOt *< — figure 2) 

and with reference t-o t 

™ at al - Am. J. Him, n-n-l- ,, in press; A. M 
Poustka. et al. Seflojnics. 2< „, (19e8) . „ ^ ^ ^ 

data also revealed that the Kot ! site Identified by L 
inventors of the present invention (see Figure 2 
15 position 113 *„, corresponded to the one previously f ou „ d 
associated with the n, locU s (EstlviU et al l987 

lT±t STT eab3e * x ° nt 9enetic studles *—* th " CF 

It 71 1 y betUee " 1RP Md D7SS t«. F.rral! 

interval, it is appreciated, however, that other 
cod lng , as identified in Figure for exaje 
0 2, CF14 and cr«, were located and extensively 
investigated. such extensive investigations of these 
other reg.ons revealed that they were not the CF gene 
based on genetic data and seguence analysis, civen the 
L=* of knowledge of the location of the CF gene and its 
characteristics, the extensive and tiae consling 
exa„ lnatlon of t „ e nMrfcy preflumptiv(i J 

not advance the direction of search for the CF gene 
However, these investigations were necessary in ordar to 

"gLT ^ POSSibUlty " CF «~ * ">°~ 

to be readily recoverable in the amplified genoaic 
libraries initially used. These Uss clonable regions 
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were located near the DNA segments H2 . 3A and X.6, and 
Dust beyond cosmid cW44, at positions 75-100 kb, 205-225 
kb, and 275-285 kb in Figure 2,. respectively. The 
recombinant clones near H2.3A were found to be very 
unstable with dramatic rearrangements after only a few 
passages of bacterial culture. To fill in ^ re6Ulting 
gaps, primary walking libraries were constructed using 
special host-vector systems which have been reported to 
allow propagation of unstable sequences (A. R. Wyman, L 
B. Wolfe, D. Botstein, Proc. w,f *~ r * . fifH . rT 82 
2880 (1985, ; K. F. Wertman, A. R. Wynan , D. Botstein, ' 
Sene 49, 253 (1986}; A. R. Wyaan, K. F. Wertman, D. - 
Barker, C. Helms, W. H. Petri, Q^, 4 9, 263 (1986)]. 
Although the region near cosmid cW44 remains to be 
recovered, the region near X.6 was successfully rescued 
with these libraries. 

gpygTRUCTTOK op a**n m TiTFrTiTTTm 

Genomic libraries were constructed after procedures 
described in Manatis, et al, Molecn^ , 
Laboratory HnnnM (Cold Spring Harbor Laboratory, Cold 
spring Harbor, New York 1982) and are listed in Tab le 1 
This includes eight phage libraries, one of which was 
provided by T. Maniatis [Fritsch et al, CjOI, 19-959 
(1980)]; the rest were constructed as part of this work 
according to procedures described in Maniatis et al 
SMEEa. Four phage libraries were cloned in ADASH 
(commercially available from Stratagene, and three in 
AFIX (commercially available from Stratagene) , with 
vector arms provided by the manufacturer. One A DASH 
library was constructed from Sau 3A- par tially digested 

fr ° ffi 3 hu »an-ha*ster hybrid containing human 
chromosome 7 (4AF/102/K015) fRommens et al Am. J. H „ m 
^ <3, 4 (1988)], and other libraries from partial 
Sau3A, total BamHl, or total EcoRI digestion of human 
peripheral blood or lymphoblastoid DNA. To avoid loss of 
unstable sequences, five of the phage libraries were 
propagated on the recombination-deficient hosts DB1316 
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(recD >, CES .200 (recBC> (Wyinan et al, supra , Wertman et 
al £MEra, ..Wyman et al £upxfl] ; or TAP90 (Patterson et al 
Nucleic ftr.jdp pes, 15:6298 (1987)]. Three cosmid 
libraries were then constructed, m one the vector 
PCV108 [Lau et al Proc. Natl. Acad. m 80:5225 

(1983)] was used to clone partially digested (Sau 3A) DMA 
from 4AF/102/K015 [Rommens et al Am. J . H, m r.„ n ^ , 43:4 
(1988)]. a second cosmid library was prepared by cloning 
partially digested (Mbo I) human lymphoblastoid DNA into 
the vector pWE-IL2 R/ prepared by inserting the RSV (Rous 
Sarcoma Virus) promoter-driven cDNA for the interleukin-2 
receptor a-chain (supplied by M. Fordis and B. Howard) in 
place of the neo-resistance gene of P WE15 [Wahl et al 
Proc. Natl. AMri, f^j , n o ft 84:2160 (1987)]. An 
additional partial Mbo I cosmid library was prepared in 
the vector pWE-IL2-Sal, created by inserting a Sal I 
linker into the Bam HI cloning site of pWE-EL2R (M. 
Drumm, unpublished data); this allows the use of the 
partial fill-in technique to ligate Sal I and Mbo I ends 
preventing tandem insertions [Zabarovsky et al fiejie. 42-19 
(1986) ] . cosmid libraries were propagated in ^ ^ 
host strains DH1 or 490A [M. Steinmetz, a. Winoto, K. 
Minard, L. Hood, £eH 28, 489(1982)]. 
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TABLE 1 
GENOMIC LTBFftfljVfl 

Source of human DNA Host 

Haell/Alul-partially LE392 
digested total human 
liver DNA 

Sau3a-partially digested DK1 

r\vT * ^ * « , 



Complex j*TY p ~f 



DNA from 4AF/K015 

Sau3A-partially digested LE392 
DNA from 4AF/K015 

Sau3A-partially digested DB1316 
total human peripheral 
blood DNA 



1 x 10* 
(amplified) 

3 x 10* 
(amplified) 



1 x 10< 



(amplified) 

1.5 x 10 6 
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40 



Adash 

Adash 

AFIX 
AFIX 
AFIX 

PWE-IL2R 



45 



50 



BamHI-digested total DB1316 
human peripheral blood 
DNA 

EcoRI-partially digested DB1316 
total human peripheral 
blood DNA 

Mbol-partially digested LE392 
human lymphoblastoid DNA 

Mbol-partially digested CE200 
human lymphoblastoid DNA 

Mbol-partially digested TAP90 
human lymphoblastoid DNA 

Mbol-partially digested 490A 
human lymphoblastoid DNA 

PWE-IL2R- Mbol-partially digested 490A 
*aj- human lymphoblastoid DNA 

Conlns A , E f oRI -P art i*Hy digested MC1061 
coiixns Alac (24 -no kbj 

et al 1 

(jumping) human lymphoblastoid DNA 
find 

Iannuzzi 



1.5 x io 6 
8 x 10* 

1-5 x 10* 

1.2 X 10* 

1.3 X 10 6 
5 x 10 5 
1.2 x 10 6 
3 x 10 6 



Lawn 
et al 
1980 



supra 

et al 
supra 



Three of the phage libraries were propagated and 
amplified in coli bacterial strain LE392. Four 
subsequent libraries were plated on the recombination- 
deficient hosts DB1316 (recD> or CES200 (rec BC» [Wyman 
1985, supra; Wertman 1986, sup ra; and Wyman 1986, supra 1 
or in one case TAP90 [T.A. Patterson and H. Dean, Nucleic 
Acids R esearch 15, 6298 (1987)]. 

Single copy DNA segments (free of repetitive 
elements) near the ends of each phage or cosmid insert 
were purified and used as probes for library screening to 
isolate overlapping DNA fragments by standard procedures. 
(Maniatis, et al, supra ) . 

1-2 x 10' phage clones were plated on 25-30 150 mm 
petri dishes with the appropriate indicator bacterial 
host and incubated at 37 »c for 10-16 hr. Duplicate 
"lifts" were prepared for each plate with nitrocellulose 
or nylon membranes, prehybridized and hybridized under 
conditions described (Romnens et al, 1988, supra ] . 
Probes were labelled with ,J P to a specific activity of >5 
x 10* cpm//ig using the random priming procedure [A. P. 
Feinberg and B. Vogelstein, Anal. Biochp™. i 32 , 6 
(1983)]. The cosmid library was spread on ampicillin- 
containing plates and screened in a similar manner. 

DNA probes which gave high background signals could 
often be used more successfully by preannealing the 
boiled probe with 250 jig/ml sheared denatured placental 
DNA for 60 minutes prior to adding the probe to the 
hybridization bag. 

For each walk step, the identity of the cloned DNA 
fragment was determined by hybridization with a somatic 
cell hybrid panel to confirm its chromosomal location, 
and by restriction mapping and Southern blot analysis to 
confirm its colinearity with the genome. 

The total combined cloned region of the genomi: -. a 
sequences isolated and the overlapping cDNA clones, 
extended >500 kb. To ensure that the DNA segments 
isolated by the chromosome walking and jumping procedures 
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-..were colinear with the genomic sequence, each segment was 
examined by: : - 

(a) hybridization analysis with human-rodent somatic 
hybrid cell lines to confirm chromosome 7 localization, 
5 (b) pulsed field gel electrophoresis, and 

(c) comparison of the restriction map of the cloned 
DNA to that of the genomic DNA. 

Accordingly, single copy human DNA sequences were 
isolated from each recombinant phage and cosmid clone and 
10 used as probes in each of these hybridization analyses as 
performed by the procedure of Maniatis, et al supra. 

While the majority of phage and cosmid isolates 
represented correct walk and jump clones, a few resulted 
from cloning artifacts or cross-hybridizing sequences 
15 from other regions in the human genome, or from the 

hamster genome in cases where the libraries were derived 
from a human-hamster hybrid cell line. Confirmation of 
correct localization was particularly important for 
clones isolated by chromosome jumping. Many jump clones 
20 were considered and resulted in non-conclusive 

information leading the direction of investigation away 
from the gene. 

Z±3- CONFIRMATION OF THE RESTRICTION MAP 

Further confirmation of the overall physical map of 
25 the overlapping clones was obtained by long range 

restriction mapping analysis with the use of pulsed field 
gel electrophoresis (J, M. Rommens, et al. Am. J, Hum. 
Genet, in press, A. M. Poustka et al, 1988, supra M.L, 
Drumm et al, 1988 supra ) . 

Figures 3A to 3E illustrates the findings of the 
long range restriction mapping Gtudy, where a schematic 
representation of the region is given in Panel E. DNA 
from the human-hamster cell line 4AF/102/K015 was 
digested with the enzymes (A) Sal I, (B) Xho I, (C) Sfi I 
and (D) Nae I, separated by pulsed field gel 
electrophoresis, and transferred to Zetaprobe™ (BioRad) . 
For each enzyme a single blot was sequentially hybridized 
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with the probes indicated below each of the panels of 
Figure A to D, with stripping of the blot between 
h ^ bridizations - The symbols for each enzyme of Figure 3E 
are: A, Nae I; B, Bss HII; F. Sfi I; L, Sal I; M, Mlu I; 
5 N, Not I; R, Nru I; and X, Xho 1. c corresponds to the 
compression zone region of the gel. DNA preparations, 
restriction digestion, and crossed field gel 
electrophoresis methods have been described (Rommens et 
al, in press, siiETZ) . The gels in Figure 3 were run in 
10 0.5X TBE at 7 volts/cm for 20 hours with switching 

linearly ramped from 10-40 seconds for (A), (B) , and (C) , 
and at 8 volts/cm for 20 hours with switching ramped 
linearly from 50-150; seconds for (D) . Schematic" 
interpretations of the hybridization pattern are given 
below each panel. Fragment lengths are in kilobases and 
were sized by comparison to oligomerized bacteriophage 
ADNA and Sflcchflrpmycfs cerevifii** chromosomes. 

H4.0, J44, EG1.4 are genomic probes generated from 
the walking and jumping experiments (see Figure 2). J30 
has been isolated by four consecutive jumps from D7S8 
(Collins et al, 1987, Ianuzzi et al, 1989, supra : 

M. Dean, et al, submitted for publication). 10-1, B.75, 
and CEl. 5/1.0 are cDNA probes which cover different 
regions of the CF transcript: 10-1 contains exons I - 
25 VT, B.75 contains exons V - XII, and CEl. 5/1.0 contains 
exons XII - XXIV. Shown in Figure 3E is a composite map 
of the entire MET - D7S8 interval. The open boxed region 
indicates the segment cloned by walking and jumping, and 
the closed arrow portion indicates the region covered by 
the CF transcript. The CpG-rich region associated with 
the D7S23 locus (Estivill et al, 1987, supra) is at the 
Not I site shown in parentheses. This and other sites 
shown in parentheses or square brackets do not cut in 
4AF/102/K015, but have been observed in human lymphoblast 
35 cell lines. 
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2-1-1 IDENTIFTC ATIO^ pp c ? 

Based on the findings of long range restriction 
mapping detailed above it was determined that the entire 
CF gene is contained on a 380 kb Sal I fragment. 
5 Alignment of the restriction sites derived from pulsed 
field gel analysis to those identified in the partially 
overlapping genomic DNA clones revealed that the size of 
the CF gene was approximately 250 kb. 

The most informative restriction enzyme that served 
10 to align the map of the cloned DMA fragments and the long 
range restriction map was Xho I; all of the 9 Xho 1 sites 
identified with the recombinant DNA clones appeared to be 
susceptible to at least partial cleavage in genomic DNA 
(compare maps in Figures i and 2). Furthermore, 
15 hybridization analysis with probes derived from' the 3' 
end of the CF gene identified 2 Sf ii sites and confirmed 
/^'~ Position of an anticipated Nae I site. 

These findings further supported the conclusion that 
the DNA segments isolated by the chromosome walking and 
m^vut&vw* | 20 Dumping procedures were colinear with the genuine 

sequence . 

£t£ CRITERIA FOR lDRWT TrTgnTT ^ 

A positive result based on one or more of the 
f0ll0Wing crit *ri* suggested that a cloned DNA segment 
25 may contain candidate gene sequences: 

(a) detection of cross-hybridizing sequences in 
other species (as many genes show evolutionary 
conservation) , 

(b) identification of c P G islands, which often mark 

JO the 5' end of vertebrate genes TA p m*-,* « ^ 

te genes l A - p - Bird, Hfltuce, 321, 

209 (1986); M. Gardiner-Garden and M. Frommer, J. Mo l 

filfil^ 196, 261 (1987)], 

(c) examination of possible mRNA transcripts in 
tissues affected in CF patients, 

(d) isolation of corresponding cDNA sequences, 

(e) identification of open reading frames by direct 
sequencing of cloned DNA segments. 



35 



k 



^P^'*? 134 .rr:-*::-; r . pct/ca9 1/00009 

28 

Cross-species hybridization showed strong sequence 
conservation between human and bovine DNA when CF14, E4 . 3 
and HI. 6 were used as probes, the results of which are 
. shown in Figures 4A, 4B and AC. 
5 Human, bovine, mouse, hamster, and chicken genomic 

DMAs were digested with Eco RI (R) , Hind III (H) , and Pst 
I (P) , electrophoresed, and blotted to Zetabind™ 
(BioRad) . The hybridization procedures of Rommens et al, 
1988, pypra, were used with the most stringent wash at 
10 55 C C, 0.2X SSC, and 0.1% SDS. The probes used for 

hybridization, in Figure 4, included: (A) entire cosmid 
CF14, (B) E4.3, (C) HI. 6. In the schematic of Figure 
(D) , the shaded region indicates the area of cross- 
species conservation. 
15 The fact that different subsets of bands were 

detected in bovine DNA with these two overlapping DNA 
segments (HI. 6 and E4.3) suggested that the conserved 
sequences were located at the boundaries of the 
overlapped region (Figure 4(D)). When these DNA segments 
20 were used to detect RNA transcripts from a variety of 
tissues, no hybridization signal was detected. In an 
attempt to understand the cross-hybridizing region and to 
identify possible open reading frames, the DNA sequences 
of the entire HI. 6 and part of the E4.3 fragment were 
25 determined. The results showed that, except for a long 
stretch of CG-rich sequence containing the recognition 
sites for two restriction enzymes (Bss HII and Sac II) , 
often found associated with undermethylated CpG islands, 
there were only short open reading frames which could not 
30 easily explain the strong cross-species hybridization 
signals. 

To examine the methylation status of this highly 
CpG-rich region revealed by sequencing, genomic DNA 
*»r??les prepared from fibroblasts and lymphoblasts were 
35 digested with the restriction enzymes Hpa II and Msp I 
and analyzed by gel blot hybridization. The enzyme Hpa 
IT cuts the DNA sequence 5'-CCGG-3' only when the second 
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cytbsine is unmethylated, whereas Msp I cuts this 
-sequence regardless of the state of methylation. Small 
DNA fragments were generated by both enzymes, indicating 
that this CpG-rich region is indeed undermethylated in 
5 genomic DNA. The gel-blot hybridization with the E4.3 
segment (Figure 6) reveals very small hybridizing 
fragments with both enzymes, indicating the presence of a 
hypomethylated CpG island. 

The above results strongly suggest the presence of a 
10 coding region at this locus. Two DNA segments (E4.3 and 
HI. 6) which detected cross-species hybridization signals 
from this area were used as probes to screen cDNA _ 
libraries made from several tissues and cell types. 

cDNA libraries from cultured epithelial cells were 
15 prepared as follows. Sweat gland cells derived from a 
non-CF individual and from a CF patient were grown to 
first passage as described [G. Collie et al, In Vitro 
Ce l l. Pevt Pi Pi . 21, 592,1985]. The presence of 
outwardly rectifying channels was confirmed in these 
cells (J.A. Tabcharani, T.J. Jensen, J.R. Riordan, J.w. 
Hanrahan, J, Mefflb, Rio] . , in press) but the CF cells were 
insensitive to activation by cyclic AMP (T.J. Jensen, 
J.W. Hanrahan, J.A. Tabcharani, M. Buchwald and J.R. 
Riordan, Pediatric pulponplogy, Supplement 2, 100, 1988). 
RNA was isolated from them by the method of J.M. Chirgwin 
et al ( B i ochem i stry 18, 5294, 1979). Poly A+RNA was 
selected (H. Aviv and P. Leder, Proc. Natl . i^h ^ 
USA 69, 1408, 1972) and used as template for the 
synthesis of cDNA with oligo (dT) 12-18 as a primer. The 
second strand was synthesized according to Gubler and 
Hoffman (QsnSi 25, 263, 1983). This was methylated with 
Eco Ri methylase and ends were made flush with T4 DNA 
polymerase. Phosphorylated Eco RI linkers were ligated 
to the zZ NA and restricted with Eco ri. Removal of 
excess linkers and partial size fractionation was 
achieved by Biogel A-50 chromatography. The cDNAs were 
then ligated into the Eco RI site of the commercially 
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available lamdba ZAP. Recombinant were packaged and 
propagated in BB4 . Portions of the packaging 

mixes were amplified and the remainder retained for 
screening prior to amplification. The same procedures 
were used to construct a library from RNA isolated from 
preconfluent cultures of the T-84 colonic carcinoma cell 
lane (Dharmsathaphorn, K. et al. Am. J . Phy cj^ 246 
G204, 1984). The numbers of independent recombinant ' in 
the three libraries were: 2 x 10« for the non-CF sweat 
10 gland cells, 4. 5 x 10* for the CF sweat gland cells and 
3.2 x 10 from T-84 cells. These phages were plated at 
50,000 per 15 cm plate and plaque lifts made using nylon 
membranes (Biodyne) and probed with DNA fragments 
labelled with »P using DNA polymerase I and a random 
mixture, of oligonucleotides as primer. Hybridization 
conditions were according to G.M. Wahl and S.L. Berger 
( Heth, F^vmo] . 152,415, 1987). Bluescript- plasmids 
were rescued from plaque purified clones by excision with 
M13 helper phage. The lung and pancreas libraries were 
purchased from Clontech Lab Inc. with reported sizes of 
1.4 x io« and 1.7 x 10« independent clones. 

After screening 7 different libraries each 
containing Ixio'-Sx io« independent clones, 1 single 
clone (identified as 10-1, was isolated with Hl.6 from a 

library made from the cultured sweat gland 
epithelial cells of an unaffected (non-CF, individual. 

DNA sequencing analysis showed that probe 10-1 
contained an insert of 920 bp in size and one potential 
long open reading frame (ORF) . since one end of the ' 
sequence shared perfect sequence identity with Hi 6 it 
was concluded that the cDNA clone was probably derived 
from this region. The DNA sequence in common was 
however, only 113 bp long (see Figures 1 and 7, ' As 
detailed below, this sequence in fact corresponded to the 
5 -most exon of the putative CF gene. The short sequence 
overlap thus explained the weak hybridization signals in 
library screening and inability to detect transcripts in 
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RNA, gel-blot analysis. In addition, the orientation of 
t ^ ie trans cription unit was tentatively established on the 
basis of alignment of the genomic DNA sequence with the 
presumptive ORF of 10-1. 

Since the corresponding transcript was estimated to 
be approximately 6500 nucleotides in length by RNA gel- 
blot hybridization experiments, further cDNA library 
screening was required in order to clone the remainder of 
the coding region. As a result of several successive 
screenings with cDNA libraries generated from the colonic 
carcinoma cell line T84, normal and CF sweat gland cells, 
pancreas and adult lungs, 18 additional clones were 
isolated (Figure 7 , as subsequently discussed in greater 
detail) . DNA sequence analysis revealed that none of 
these cDNA clones corresponded to the length of the 
observed transcript, but it was possible to derive a 
consensus sequence based on overlapping regions. 
Additional cDNA clones corresponding to the 5' and 3' 
ends of the transcript were derived from 5' and 3' 
primer-extension experiments. Together, these clones 
span a total of about 6.1 kb and contain an ORF capable 
of encoding a polypeptide of 1480 amino acid residues 
(Figure 1) . 

It was unusual to observe that most of the cDNA 
clones isolated here contained sequence insertions at 
various locations of the restriction map of Figure 7. 
The map details the genomic structure of the CF gene. 
Exon/intron boundaries are given where all cDNA clones 
isolated are schematically represented on the upper half 
of the figure. Many of these extra sequences clearly 
corresponded to intron regions reversely transcribed 
during the construction of the cDNA, as revealed upon 
alignment with genomic DNA sequences. 

Since the number of recombinant cDNA clones for the 
CF gene detected in the library screening was much less 
than would have been expected from the abundance of 
transcript estimated from RNA hybridization experiments, 



PCT/GA9 1/00009 



it seemed probable that the clones that contained 
aberrant structures were preferentially retained while 
the proper clones were lost during propagation. 
Consistent with this interpretation, poor growth was 
5 observed for the majority of the recombinant clones 
isolated in this study, regardless of the vector used. 

The procedures used to obtain the 5' and 3' ends of 
the cDNA were similar to those described (M. Frohman et 
al, PrPCt Nflti ftCflflr SCI, USA, 85, 8998-9002, 1988) . For 
10 the 5' end clones, total pancreas and T84 poly A + RNA 
samples were reverse transcribed using a primer, (10b) , 
which is specific to exon 2 similarly as has been 
described for the primer extension reaction except that 
radioactive tracer was included in the reaction. The 
15 fractions collected from an agarose bead column of the 
first strand synthesis were assayed by polymerase chain 
reaction (PCR) of eluted fractions. The oligonucleotides 
used were within the 10-1 sequence (145 nucleotides 
apart) just 5' of the extension primer. The earliest 
20 fractions yielding PCR product were pooled and 

concentrated by evaporation and subsequently tailed with 
terminal deoxynucleotidyl transferase (BRL Labs.) and 
dATP as recommended by the supplier (BRL Labs) . A second 
strand synthesis was then carried out with Taq Polymerase 
25 (Cetus, AmpliTaq**) using an oligonucleotide containing a 
tailed linker sequence 5 ' CGGAATTCTCGAGATC (T) 12 3 ' . 

Amplification by an anchored (PCR) experiment using 
the linker sequence and a primer just internal to the 
extension primer which possessed the Eco RI restriction 
30 site at its 5' end was then carried out. Following 
restriction with the enzymes Eco RI and Bgl II and 
agarose gel purification size selected products were 
cloned into the plasmid Bluescript KS available from 
Stratagene by standard procedures (Maniatis et al, 
35 pyprfl) . Essentially all of the recovered clones 

contained inserts of less than 350 nucleotides. To 
obtain the 3' end clones, first strand cDNA was prepared 
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with reverse transcription of 2 Ag .T84 poly A -f RNA using 
the tailed linker oligonucleotide previously described 
with conditions similar to those of the primer extension. 
Amplification by PCR was then carried out with the linker 
5 oligonucleotide and three different oligonucleotides 
corresponding to known sequences of clone T16-4.5. a 
preparative scale reaction (2 x 100 ul) was carried out 
with one of these oligonucleotides with the sequence 
5 ' ATGAAGTCCAAGG ATTTAG 3 ' . 
10 This oligonucleotide is approximately 70 nucleotides 

upstream of a Hind III site within the known sequence of 
T16-4.5. Restriction of the PCR product with Hind III 
and Xho 1 was followed by agarose gel purification "to 
size select a band at 1*. 0-1.4 kb. This product was then 
15 cloned into the plasmid Bluescript KS available from 
Stratagene. Approximately 20% of the obtained clones 
hybridized to the 3' end portion of T16-4.5. 10/10 of 
plaomids isolated from these clones had identical 
restriction maps with insert sizes of approx. 1.2 kb. 
20 All of the PCR reactions were carried out for 30 cycles 
in buffer suggested by an enzyme supplier. 

An extension primer positioned 157 nt from the 5 'end 
of 10-1 clone was used to identify the start point of the 
putative CF transcript. The primer was end labelled with 
25 7 ["P]ATP at 5000 Curies/mole and T4 polynucleotide kinase 
and purified by spun column gel filtration. The 
radiolabeled primer was then annealed with 4-5 ug poly a 
+ RNA prepared from T-84 colonic carcinoma cells in 2X 
reverse transcriptase buffer for 2 hrs. at 60*C. 
3 0 Following dilution and addition of AMV reverse 

transcriptase (Life Sciences, Inc.) incubation at 4l*c 
proceeded for 1 hour. The sample was then adjusted to 
0.4M NaOH and 20 mM EDTA, and finally neutralized, with 
NH^OAc, pH 4.6, phenol extracted, ethanol precipitated, 
35 redissolved in buffer with formamide, and analyzed on a 
polyacrylamide sequencing gel. Details of these methods 
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have been described (Meth. Enzvmoi . 152, 1987, Ed. 
Berger, A.R. Kimmel, Academic Press, N.Y.). 

Results of the primer extension, experiment using an 
. extension oligonucleotide primer starting 157 nucleotides 
from the 5' end of 10-1 is shown in Panel A of Figure 10. 
End labelled 0X174 bacteriophage digested with Hae III 
(BRL Labs) is used as size marker. Two major products 
are observed at 216 and 100 nucleotides. The sequence 
corresponding to 100 nucleotides in 10-1 corresponds to a 
very GC rich sequence (11/12) suggesting that this could 
be a reverse transcriptase pause site. The 5' anchored 
PCR results are shown in panel B of Figure 10. The 1.4% 
agarose gel shown on. the left was blotted and transferred 
to Zetaprobe~ membrane (Bio-Rad Lab) . DNA gel blot 
hybridization with radiolabeled 10-1 is shown on the 
right. The 5' extension products are seen to vary in 
size from 170-280 nt with the major product at about 200 
nucleotides. The PCR control lane shows a fragment of 
145 nucleotides, it was obtained by using the test 
oligomers within the 10-1 sequence. The size markers 
shown correspond to sizes of 154 , 220/210, 298, 344, 394 
nucleotides (lkb ladder purchased from BRL Lab). 

The schematic shown below Panel B of Figure 10 
outlines the procedure to obtain double stranded cDNA 
used for the amplification and cloning to generate the 
clones PA3-5 and TB2-7 shown in Figure 7. The anchored 
PCR experiments to characterize the 3 'end are shown in 
panel C. As depicted in the schematic below Figure 10C 
three primers whose relative position to each other were 
known were used for amplification with reversed 
transcribed T84 RNA as described. These products were 
separated on a 1% agarose gel and blotted onto nylon 
membrane as described above. DNA-blot hybridization with 
the 3' portion of the T16-4.5 c i one yielded bands of 
sizes that corresponded to the distance between the 
specific oligomer used and the 3 'end of the transcript. 
These bands in lanes 1, 2a and 3 are shown schematically 
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below Panel C in Figure 10. .The band in lane 3 is weak 
as only 60 nucleotides of this segment overlaps with the 
probe used. Also indicated in the schematic and as shown 
in the lane. 2b is the product generated by restriction of 
the anchored PCR product to facilitate cloning to 
generate the THZ-4 clone shown in Figure 7. 

DNA-blot hybridization analysis of genomic DNA 
digested with EcoRI and Hindlll enzymes probed with 
portions of cDNAs spanning the entire transcript suggest 
that the gene contains at least 26 exons numbered as 
Roman numerals I through XXVI (see Figure 9) . These 
correspond to the numbers l through 26 shown in Figure 7. 
The size of each band is given in kb. 

In Figure 7 , open boxes indicate approximate 
positions of the 24 exons which have been identified by 
the isolation of >22 clones from the screening of cDNA 
libraries and from anchored PCR experiments designed to 
clone the 5' and 3' ends. The lengths in kb of the Eco 
RI genomic fragments detected by each exon is also 
indicated. The hatched boxes in Figure 7 indicate the 
presence of intron sequences and the stippled boxes 
indicate other sequences. Depicted in the lower left by 
the closed box is the relative position of the clone Hi. 6 
used to detect the first cDNA clone 10-1 from among 10* 
phage of the normal sweat gland library. As shown in 
Figures 4(D) and 7, the genomic clone Hi. 6 partially 
overlaps with an EcoRI fragment of 4.3 kb. All of the 
cDNA clones shown were hybridized to genomic DNA and/or 
were fine restriction mapped. Examples of the 
restriction sites occurring within the cDNAs and in the 
corresponding genomic fragments are indicated. 

With reference to Figure 9, the hybridization 
analysis includes probes; i.e., cDNA clones 10-1 for 
panel A, T16-1 (3' portion) for panel B, T16-4.5 (central 
portion) for panel C and T16-4.5 ('3' end portion) for 
panel D. In panel A of Figure 9, the cDNA probe 10-1 
detects the genomic bands for exons I through VI. The 3' 
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portion of T16-1 generated by Nrul restriction detects 
exoris IV through XIII as shown in Panel B. This probe 
partially overlaps with 10-1. Panels C and D, 
respectively, show genomic bands detected by the central 
and 3' end EcoRI fragments of the clone T16-4.5. Two 
EcoRI sites occur within the cDNA sequence and split 
exons XIII and XIX. As indicated by the exons in 
parentheses, two genomic EcoRI bands correspond to each 
of these exons. Cross hybridization to other genomic 
fragments was observed. These bands, indicated by N, are 
not of chromosome 7 origin as they did not appear in 
human-hamster hybrids containing human chromosome 7. The 
faint band in panel D indicated by XI in brackets is 
believed to be caused by the cross-hybridization of 
sequences due to internal homology with the cDNA. 

Since 10-1 detected a strong band on gel blot 
hybridization of RNA from the T-84 colonic carcinoma cell 
line, this cDNA was used to screen the library 
constructed from that source. Fifteen positives were 
obtained from which clones T6, T6/20, Til, T16-1 and T13- 
1 were purified and sequenced. Rescreening of the same 
library with a 0.75 kb Bam HI-Eco RI fragment from the 3' 
end of T16-1 yielded T16-4.5. A 1.8kb EcoRI fragment 
from the 3' end of T16-4.5 yielded T8-B3 and T12a, the 
latter of which contained a polyadenylation signal and 
tail. Simultaneously a human lung cDNA library was 
screened; many clones were isolated including those shown 
here with the prefix * CDL ' . A pancreas library was also 
screened, yielding clone CDPJ5. 

To obtain copies of this transcript from a CF 
patient, a cDNA library from RNA of sweat gland 
epithelial cells from a patient was screened with the 
0.75 kb Bam HI - Eco RI fragment from the 3' end of T16-1 
and clones C16-1 and Cl-l/5, which covered all but exon 
35 i, were isolated. These two clones both exhibit a 3 bp 
deletion in exon 10 which is not present in any other 
clone containing that exon. Several clones, including 
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CDLS26-1 from the lung library and T6/20 and T13-1 
isolated from T84 were derived from partially processed 
transcripts. This was confirmed by genomic hybridization 
and by sequencing across the exon-intron boundaries for 
5 each clone. Til also contained additional sequence at 
each end. T16-4.5 contained a small insertion near the 
boundary between exons 10 and 11 that did not correspond 
to intron sequence. Clones CDLS16A, 11a and 13a from the 
lung library also contained extraneous sequences of 

10 unknown origin. The clone C16-1 also contained a short 
insertion corresponding to a portion of the -y-transposon 
of £. coli ; this element was not detected in the other 
clones. The 5' clones PA3-5, generated from pancreas RNA 
and TB2-7 generated from T84 RNA using the anchored PCR 

15 technique have identical sequences except for a single 

nucleotide difference in length at the 5' end as shown in 
Figure 1. The 3' clone, THZ-4 obtained from T84 RNA 
contains the 3' sequence of the transcript in concordance 
with the genomic sequence of this region. 

20 A combined sequence representing the presumptive 

coding region of the CF gene was generated from 
overlapping cDNA clones. Since most of the cONA clones 
were apparently derived from unprocessed transcripts , 
further studies were performed to ensure the authenticity 

25 of the combined sequence. Each cDNA clone was first 

tested for localization to chromosome 7 by hybridization 
analysis with a human-hamster somatic cell hybrid 
containing a single human chromosome 7 and by pulsed 
field gel electrophoresis. Fine restriction enzyme 

30 mapping was also performed for each clone. While 

overlapping regions were clearly identifiable for most of 
the clones, many contained regions of unique restriction 
patterns . 

To further characterize these cDNA clones, they were 
35 used as probes in gel hybridization experiments with 

EcoRI-or Hindlll-digested human genomic DNA. As shown in 
Figure 9, five to six different restriction fragments 
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could be detected with the 10-1 cDNA and a similar number 
of fragments with other cDNA clones, suggesting the 
presence of multiple exons for the putative CF gene. The 
hybridization studies also identified those cDNA clones 
5 with unprocessed intron sequences as they showed 

preferential hybridization to a subset of genomic DNA 
fragments. For the confirmed cDNA clones, their 
corresponding genomic DNA segments were isolated and the 
exons and exon/ intron boundaries sequenced. As indicated 
10 in Figure 7, at least 27 exons have been identified which 
includes split exons 6a, 6b, 14a, 14b and 17a, 17b. 
Based on this information and the results of physical 
mapping experiments, the gene locus was estimated to span 
250 kb on chromosome 7* 
15 2.6 THE SEQUENCE 

Figure 1 shows the nucleotide sequence of the cloned 
cDNA encoding CFTR together with the deduced amino acid 
sequence. The first base position corresponds to the 
first nucleotide in the 5' extension clone PA3-5 which is 
20 one nucleotide longer than TB2-7. Arrows indicate 
position of transcription initiation site by primer 
extension analysis. Nucleotide 6129 is followed by a 
poly(dA) tract. Positions of exon junctions are 
indicated by vertical lines. Potential membrane-spanning 
25 segments were ascertained using the algorithm of 

Eisenberg et al J f Viol, pj<?l, 179:125 (1984). Potential 
membrane-spanning segments as analyzed and shown in 
Figure 11 are enclosed in boxes of Figure l. in Figure 
11/ the mean hydropathy index [Kyte and Doolittle, 
30 Molec. pjol, 157: 105, (1982)] of 9 residue peptides is 
plotted against the amino acid number. The corresponding 
positions of features of secondary structure predicted 
according to Gamier et al, f j. Molec. , 157, 165 

(1982)] are indicated in the lower panel. Amino acids 
35 comprising putative ATP-binding folds are underlined in 
Figure 1. Possible sites of phosphorylation by protein 
kinases A (PKA) or C (PKC) are indicated by open and 



closed circles, respectively. The open triangle is over 
the 3bp (CTT) which are deleted in CF (see discussion 
below). The cDNA clones in Figure 1 were sequenced by 
the dideoxy chain termination method employing 35 S 
labelled nucleotides by the Dupont Genesis 2 000"* 
automatic DNA sequencer. 

The combined cDNA sequence spans 6129 base pairs 
excluding the poly (A) tail at the end of the 3' 
untranslated region and it contains an ORF capable of 
encoding a polypeptide of 1480 amino acids (Figure 1). 
An ATG (AUG) triplet is present at the beginning of this 
ORF (base position 133-135). Since the nucleotide 
sequence surrounding this" codon ( 5 ' -AGACC AUG CA-3 / ) has 
the proposed features of the consensus sequence (CC) 
A/GCC&SIGG(G) of an eukaryotic translation initiation site 
with a highly conserved A at the -3 position, it is 
highly probable that this AUG corresponds to the first 
methionine codon for the putative polypeptide. 

To obtain the sequence corresponding to the 5' end 
of the transcript, a primer-extension experiment was 
performed, as described earlier. As shown in Figure 10A, 
a primer extension product of approximately 216 
nucleotides could be observed suggesting that the 5' end 
of the transcript initiated approximately 60 nucleotides 
upstream of the end of cDNA clone 10-1. A modified 
polymerase chain reaction (anchored PGR) was then used to 
facilitate cloning of the 5 '-end sequences (Figure 10b) . 
Two independent 5 '-extension clones, one from pancreas 
and the other from T84 RNA, were characterized by DNA 
sequencing and were found to differ by only l base in 
length, indicating the most probable initiation site for 
the transcript as shown in Figure l. 

Since most of the initial cDNA clones did not 
contain a polyA tail indicative of tl*e end of a mRNA, 
anchored PCR was also applied to the 3' end of the 
transcript (Frohman et al, 1988, supra ) . Three 3'- 
extension oligonucleotides were made ^o the terminal 
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portion of the cDNA clone T16-4.5. As shown in Figure 
10c, 3 PCR products of different sizes were obtained. 
All were consistent with the interpretation that the end 
of the transcript was approximately 1.2 kb downstream of 
5 the Hindlll site at nucleotide position 5027 (see Figure 
1) . The DNA sequence derived from representative clones 
was in agreement with that of the T84 cDNA clone Tl2a 
(see Figure 1 and 7) and the sequence of the 
corresponding 2.3 kb EcoRI genomic fragment. 
10 isl MOLECULA R GEKETJQB pp CP 
SITES OP EXPREflflTOM 

To visualize the transcript for the putative CF 
gene, RNA gel blot hybridization experiments were 
performed with the lb-1 cDNA as probe. The RNA 
15 hybridization results are shown in Figure 8. 

RNA samples were prepared from tissue samples 
obtained from surgical pathology or at autopsy according 
to methods previously described (A.M. Kinunel, S.L. 
Berger, eds. Meth, EPSVmffl , 152, 1987). Formaldehyde 
gels were transferred onto nylon membranes (Zetaprobe ™; 
BioRad Lab) . The membranes were then hybridized with DNA 
probes labeled to high specific activity by the random 
priming method (A. P. Feinberg and B. Vogelstein, Anal. 
Pipchem. 132, 6, 1983) according to previously published 
25 procedures (J. Rommens et al, Am. J. Hum. r.» n o+ 43/ 645 _ 
663, 1988). Figure 8 shows hybridization by the cDNA 
clone 10-1 to a 6.5kb transcript in the tissues 
indicated. Total RNA (10 ng) of each tissue, and Poly A+ 
RNA (1 nq) of the T84 colonic carcinoma cell line were 
separated on a 1% formaldehyde gel. The positions of the 
28S and 18S rRNA bands are indicated. Arrows indicate 
the position of transcripts. Sizing was established by 
comparison to standard RNA markers (BRL Labs) . HL60 is a 
human promyelocyte leukemia cell line, and T84 is a 
35 human colon cancer cell line. 

Analysis reveals a prominent band of approximately 
6.5 kb in size in T84 cells. Similar, strong 
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hybridization signals were also detected in pancreas and 
prinary cultures of cells fro* nasal polyps, suggesting 
that the mature mRNA of the putative CF gene is 
: approximately 6.5 kb. Minor hybridization signals 
5 probably representing degradation products, were detected 
at the lower size ranges but they varied between 
different experiments. Identical results were obtained 
with other cDNA clones as probes. Based on the 
hybridization band intensity and comparison with those 
detected for other transcripts under identical 
experimental conditions, it was estimated that the 
putative CF transcripts constituted approximately 0.01% 
of total mRNA in T84 cells. 

A number of other tissues were also surveyed by RNA 
gel blot hybridization analysis in an attempt to 
correlate the expression pattern of the 10-1 gene and the 
pathology of CF. As shown in Figure 8, transcripts, all 
of identical size, were found in lung, colon, sweat 
glands (cultured epithelial cells), placenta, liver, and 
parotid gland but the signal intensities in these tissues 
varied among different preparations and were generally 
weaker than that detected in the pancreas and nasal 
Polyps, intensity varied among different preparations, 
for example, hybridization in Kidney was not detected in 
the preparation shown in Figure 8, but can be discerned 

couldTT 1 ' r€Peat6d aSSayS * N ° h * brid "ation signals 
could be discerned in the brain or adrenal gland (Figure 
8), nor in skin fibroblast and lymphoblast cell lines 

In summary, expression of the CF gene appeared to 
occur in many of the tissues examined, with higher levels 
m those tissues severely affected in CF. While this 
epithelial tissue-specific expression pattern is in good 
agreement with the disease pathology, no significant 
difference has been detected in the amount or size of 
transcripts from CF and control tissues, consistent with 
the assumption that CF mutations are subtle changes at 
the nucleotide level. 
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3.2 THE MAJOR CF MUTATION 

Figure 16 shows the DMA sequence at the F508 
deletion. On the left, the reverse complement of the 
sequence from base position 1649-166.4 of the normal 
5 sequence (as derived from the cDNA clone T16) . The 
nucleotide sequence is displayed as the output (in 
arbitrary fluorescence intensity units, y-axis) plotted 
against time (x-axis) for each of the 2 photomultiplier 
tubes (PMT/1 and /2) of a Dupont Genesis 2000™ DNA 

10 analysis system. The corresponding nucleotide sequence 
is shown underneath. On the right is the same region 
from a mutant sequence (as derived from the cDNA clone 
C16) . Double-stranded plasmid DNA templates were 
prepared by the alkaline lysis procedure. Five /ig of 

15 plasmid DNA and 75 ng of oligonucleotide primer were used 
in each sequencing reaction according to the protocol 
recommended by Dupont except that the annealing was done 
at 4 5*C for 30 min and that the elongation/termination 
step was for 10 min at 42 °C. The unincorporated 

20 fluorescent nucleotides were removed by precipitation of 
the DNA sequencing reaction product with ethanol in the 
presence of 2.5 M ammonium acetate at pH 7.0 and rinsed 
one time with 70% ethanol. The primer used for the T16-1 
sequencing was a specific oligonucleotide 

25 5 ' GTTGGCATGCTTTGATGACGCTTC3 ' spanning base position 

1708 - 1731 and that for C16-1 was the universal primer 
SK for the Bluescript vector (Stratagene) . 

Figure 17 also shows the DNA sequence around the 
F508 deletion, as determined by manual sequencing. The 

3 0 normal sequence from base position 1726-1651 (from cDNA 
T16-1) is shown beside the CF sequence (from cDNA C16-1) . 
The left panel shows the sequences from the coding 
strands obtained with the B primer 

( 5 ' GTTTTCCTGGATTATGCCTGGCAC3 ' ) and the right panel those 
3 5 from the opposite strand with the D primer 

( 5 ' GTTGGCATGCTTTGATGACGCTTC3 ' ) . The brackets indicate 
the three nucleotides in the normal that are absent in CF 
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(arrowheads). Sequencing was performed as described in 
F. -Sanger, S. Nicklen, A. R. Coulsen, Proc. Nat, Acad. 
Sci. U. S. A. 74: 5463 (1977). 

The extensive genetic and physical mapping data have 
5 directed molecular cloning studies to focus on a small 
segment of DNA on chromosome 7. Because of the lack of 
chromosome deletions and rearrangements in CF and the 
lack of a well-developed functional assay for the CF gene 
product, the identification of the CF gene required a 

10 detailed characterization of the locus itself and 
comparison between the CF and normal (N) alleles. 
Random, phenotypically^ normal, individuals could not be 
included as controls in the comparison due to the high 
frequency of symptomless carriers in the population. As 

15 a result, only parents of CF patients, each of whom by 
definition carries an N and a CF chromosome, were 
suitable for the analysis. Moreover, because of the 
strong allelic association observed between CF and some 
of the closely linked DNA markers, it was necessary to 

20 exclude the possibility that sequence differences 

detected between N and CF were polymorphisms associated 
with the disease locus. 

3_a IDENTIFICATION OF RFLFa AND FAMILY STUDIES 

To determine the relationship of each of the DNA 

25 segments isolated from the chromosome walking and jumping 
experiments to CF, restriction fragment length 
polymorphisms (RFLPs) were identified and used to study 
families where crossover events had previously been 
detected between CF and other flanking DNA markers. As 

30 shown in Figure 14, a total of 18 RFLPs were detected in 
the 500 kb region; 17 of them (from E6 to CE1.0) listed 
in Table 2; some of them correspond to markers previously 
reported. 

Five of the RFLPs, namely 10-1X.6, T6/20, HI. 3 and 
35 CE1.0, were identified with cDNA and genomic DNA probes 
derived from the putative CF gene. The RFLP data are 
presented in Table 2, with markers in the MET and D7S8 
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regions included for comparison. The physical distances 
between these markers as well as their relationship to 
the MET and D7S8 regions are shown in Figure 14. 
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NOTES FOR TABLE 2 

(a) The number of N and CF-PI (CF with pancreatic 

insufficiency) chromosomes were derived from the 
parents in the families used in linkage analysis 
5 [Tsui et al, Cold Spring Harbor Svmp. Quant. Biol. 

51:325 (1986)]. 



(b) Standardized association (A) , which is less 

influenced by the fluctuation of DNA marker allele 

10 distribution among the N chromosomes, is used here 

for the comparison Yule's association coefficient 
A=(ad-bc)/(ad+bc) , where a, b, c, and d are the 
number of N chromosomes with DNA marker allele 1, CF 
with 1, N with 2, and CF with 2 respectively. 

15 Relative risk can be calculated using the 

relationship RR «= (1+A)/(1-A) or its reverse. 



(c) Allelic association (*), calculated according to A. 
Chakravarti et al. Am. J. Hum. Genet. 36:1239, 
20 (1984) assuming the frequency of 0.02 for CF 

chromosomes in the population is" included for 
comparison. 



Because of the small number of recombinant families 
25 available for the analysis, as was expected from the 
close distance between the markers studied and CF, and 
the possibility of misdiagnosis, alternative approaches 
were necessary in further fine mapping of the CF gene. 
liA ALLELIC ASSOCIATION 
30 Allelic association (linkage disequilibrium) has 

been detected for many closely linked DNA markers. While 
the utility of using allelic association for measuring 
genetic distance is uncertain, an overall correlation has 
been observed between CF and the flanking DNA markers. A 
35 strong association with CF was noted for the closer DNA 
markers, D7S23 and D7S122, whereas little or no 



10 



15 



20 



25 



30 



35 
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association was detected for the more distant markers 
MET, D7S8 or D7S424 (see Figure 1). 

As shown in Table 2, thie degree of association 
between DNA markers and CF (as measured by the Yule's 
association coefficient) increased from 0.35 for metH and 
0-17 for J32 to 0.91 for 10-1X.6 (only CF-PI patient 
families were used in the analysis as they appeared to be 
genetically more homogeneous than CF-PS) . The 
association coefficients appeared to be rather constant 
over the 300 kb from EG1.4 to HI. 3; the fluctuation 
detected at several locations, most notably at H2.3A, 
E4.i and T6/20, were probably due to the variation -in the 
allelic distribution among the N chromosomes (see Table 
2) . These data are therefore consistent with the result 
from the study of recombinant families (see Figure 14). 
A similar conclusion could also be made by inspection of 
the extended DNA marker haplotypes associated with the CF 
chromosomes (see below). However, the strong allelic 
association detected over the large physical distance 
between EG1.4 and HI. 3 did not allow further refined 
mapping of the CF gene. Since J44 was the last genomic 
DNA clone isolated by chromosome walking and jumping 
before a cDNA clone was identified, the strong allelic 
association detected for the JG2E1-J44 interval prompted 
us to search for candidate gene sequences over this 
entire interval. It is of interest to note that the 
highest degree of allelic association was, in fact, 
detected between CF and the 2 RFLPs detected by 10-1X.6, 
a region near the major CF mutation. 

Table 3 shows pairwise allelic association between 
DNA markers closely linked to CF. The average number of 
chromosomes used in these calculations was 75-80 and only 
chromosomes from CF-PI families were used in scoring CF 
chromosomes. Similar results were obtained when Yule's 
standardized association (A) was used. 
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Strong allelic association was also detected among 
subgroups of RFLPs on both the CF and N chromosomes. As 
shown in Table 3, the DNA markers that are physically 
close to each other generally appeared to have strong 
5 association with each other. For example, strong (in 
some cases almost complete) allelic association was 
detected between adjacent markers E6 and E7, between 
PH131 and W3D1.4 between the AccI and Haelll polymorphic 
sites detected by 10-1X.6 and amongst EG1.4, JG2E1, 
10 E2.6(E.9), E2.8 and E4.1. The two groups of distal 
markers in the MET and D7S8 region also showed some 
degree of linkage disequilibrium among themselves but 
they showed little association with markers from E6 to 
CE1.0, consistent with the distant locations for MET and 
15 D7S8. on the other hand, the lack of association between 
DNA markers that are physically close may indicate the 
presence of recombination hot spots. Examples of these 
potential hot spots are the region between E7 and pH13l, 
around H2.3A, between J44 and the regions covered by the 
probes 10-1X.6 and T6/20 (see Figure 14). These regions, 
containing frequent recombination breakpoints, were 
useful in the subsequent analysis of extended haplotype 
data for the CF region. 

1*1 HAPLOTVPK AKAIiYflTfl 

Extended haplotypes based on 23 DNA markers were 
generated for the CF and N chromosomes in the collection 
of families previously used for linkage analysis. 
Assuming recombination between chromosomes of different 
haplotypes, it was possible to construct several lineages 
of the observed CF chromosomes and, also, to predict the 
location of the disease locus. 

To obtain further information useful for 
understanding the nature of different CF mutations, the 
F508 deletion data were, ccrrelated with the' extended DNA 

35 



20 



25 



30 



, „^y°?)'.l 0 ?X , — - ■■ — » r — •••• • • - 

marker haplotypes. As shown in Table 4, five major 
groups of N and CF haplotypes could be defined by the 
RFLPs within or immediately adjacent to the putative CF 
gene (regions 6-8) . 
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TABLE 4 (continued) 
(a) The extended haplotype data are derived from the CF 
families used in previous linkage studies (see footnote 
(a) of Table 3) with additional CF-PS families collected 
5 subsequently (Kerem et al. Am. J, Genet. 44:827 (1989)), 
The data are shown in groups (regions) to reduce space. 
The regions are assigned primarily according to pairwise 
association data shown in Table 4 with regions 6-8 
spanning the putative CF locus (the F508) deletion is 
10 between regions 6 and 7) . A dash (-) is shown at the 

region where the haplotype has not been determined due to 
incomplete data or inability to establish phase ~ 
Alternative haplotypfe assignments are also given where 
date are incomplete. Unclassified includes those 
15 chromosomes with more than 3 unknown assignments. The 
haplotype definitions for each of the 9 regions are: 

Region 1- metD metD metH 

^*:}:?js»:v^r # fianl TaqI TaqI 

20 A - 1 1 1 

B = 2 1 2 

C - 1 1 2 

D «= 2 2 1 

E « 1 2 

25 F «= 2 1 1 

G = 2 2 2 



Region 2- E6 

30 Taal 

A « 1 

B >= 2 

C «= 1 

35 D = 2 

E = 2 

F «= 2 



E7 pH131 W3D1.4 

laal Hinfi Hindlll 

2 2 2 

111 
2 11 
12 2 
2 2 1 

2 1 1 



PCT/CA91/60009 



G = 
H « 

Region 3- 

5 

A «= 
B «= 

10 Region 4- 



A 
B 

15 C 
D 
E 



20 



25 



Region 5- 



A 
B 
C 



-1 * - 
1 

H2.3A 
Taal 

1 
2 

EG1«4 

i 

2 
2 
1 
1 



E2.6 
MspI 

2 
1 
2 
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2 
1 



EG1.4 
Ball 

1 

2 
2 
1 
2 



E2.8 

1 
2 
2 



1 
2 



JG2E1 
PStl 

2 
1 
2 
1 
1 



E4.1 

2 
1 
2 



2 
2 



30 



35 



Region 6- 



A 
B 
C 
D 
E 
F 



J44 

1 
2 
1 
1 

2 
2 



10-1X.610-1X.6 
Haelll 



2 
1 
1 
2 
2 
2 



1 
2 
2 
2 
2 
1 



A 



62 



Region 7- T6/20 
MspI 



A = 1 

B = 2 

Region 8- HI. 3 CE 1.0 

Ncol Ndel 



10 



A 
B 
C 
D 



2 
1 
1 
2 



1 
2 
1 
2 



15 Region 9- J32 J3.ll J29 

Sad HSS1 £20111 

A >= 1 1 1 

B - 2 2 2 

20 C - 2 1 2 

D «= 2 2 1 

E ■= 2 1 1 



(b) Number of chromosomes scored in each class: 
25 CF-PI(F) *» CF chromosomes from CF-PI patients with 

the F508 deletion; 
CF-PS(F) «= CF chromosomes from CF-PS patients with 

the F508 deletion; 
CF-PI = Other CF chromosomes from CF-PI patients; 
30 CF-PS = Other CF chromosomes from CF-PS patients; 

N = Normal chromosomes derived from carrier parents 



It was apparent that most recombinations between 
haplotypes occurred between regions l and 2 and between 
. regions 8 and 9, again in good agreement with the 

relatively long physical distance between these regions. 
5 Other, less frequent, breakpoints were noted between 

short distance intervals and they generally corresponded 
to the hot spots identified by pairwise allelic 
association studies as shown above. It is of interest to 
note that the F508 deletion associated almost exclusively 
10 with Group I, the most frequent CF haplotype, supporting 
the position that this deletion constitutes the major 
mutation in CF. More important, while the F508 deletion 
was detected in 89% (62/70) of the CF chromosomes with 
the AA haplotype (corresponding to the two regions, 6 and 
15 7) flanking the deletion, it was not was found in the 14 
N chromosomes within the same group ( * - 47.3, p <i 0 -) 
The F508 deletion was therefore not a sequence 
polymorphism associated with the core of the Group I 
haplotype (see Table 5) . 
20 Together, the results of the oligonucleotide 

hybridization study and the haplotype analysis support 
the fact that the gene locus described here is the CF 
gene and that the 3 bp (F508) deletion is the most common 
mutation in CF. 
25 1^1 JNTRON / EXON BOtTWn^ftjpp 

The entire genomic CF gene includes all of the 
regulatory genetic information as well as intron genetic 
information which is spliced out in the expression of the 
CF gene. Portions of the introns at the intron/exon 
boundaries for the exons of the CF gene are very helpful 
in locating mutations in the CF gene, as they permit PCR 
analysis from genomic DNA. Genomic DNA can be obtained 
from any tissue including leukocytes from blood, such 
intron information can be employed in PCR analysis for 
purposes of CF screening which will be discussed in more 
detail in a later section. As set out in Figure is with 
the headings "Exon 1 through Exon 24", there are portions 
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of the bounding introns in particular those that flank 
the exons which are essential for PCR exon amplification. 
Further assistance in interpreting the information 
■ of Figure 18 is provided in Figure 21. Genomic DNA 
5 clones containing the coding region of the CFTR gene are 
provided. As is apparent from Figure 21, there are 
considerable gaps between the clones of the exons which 
indicates the gaps in the intron portions between the 
exons of Figure 18. These gaps in the intron portions 

10 are indicated by W ... M . In Figure 21, the clones were 

mapped using different restriction endonucleases (AccI,A; 
AvaI,W; BamHI , B ; BgIII,G; BssHI, Y; EcoRV,V; FspI,F; 
Hindi, C; HindIII,H;; Kpn,K; Ncol, J; PstI,P; PvuII,U; 
SmaI,M; SacI,S; SspI,E; StyI,T; XbaI,X; XhoI,0). In 

15 Figure 21, the exons are represented by boxed regions. 

The open boxes indicate non-coding portions of the exons, 
whereas closed boxes indicate coding portions. The 
probable positions of the exons within the genomic DNA 
are also indicated by their relevative positions. The 

20 arrows above the boxes mark the location of the 

oligonucleotides used as sequencing primers in the PCR 
amplification of the genomic DNA. The numbers provided 
beneath the restriction map represent the size of the 
restriction fragments in kb. 

25 In sequencing the intron portions, it has been 

determined that there are at least 27 exons instead of 
the previously reported 24 exons in applicants' 
aforementioned co-pending applications. Exons 6, 14 and 
17, a6 previously reported, are found to be in segments 

30 and are now named exons 6a, 6b, exons 14a, 14b and exons 
17a, 17b. 

The intron portions, which have been used in PCR 
amplification, are identified in the following Table 5 
and underlined in Figure 18. The portions identified by 
3 5 the arrows are selected, but it is understood that other 
portions of the intron sequences are also useful in the 
PCR amplification technique. For example, for exon 10 
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the relevant genetic information which is preferred in 
PCR is noted by reference to the 5' and 3' ends of the 
sequence. The intron section is identified with an "i". 
Hence in Table 5 for exon 2, the preferred portions are 
identified by 2i-5 and 2i-3 and similarly for exons 3 
through 24, For exon 1, the selected portions include 
the sequence GGA. . .AAA for B115-B and ACA . • . GTG for 10D. 
For exon 13 / portions are identified by two sets: 13i-5 
and Cl-lm and X13B-5 and 13i-3A. (This exon (13) is 
large and most practical to be completed in two 
sections). C1-1M arid X13B-5 are from exon sequences. 
The specific conditions for PCR amplification of 
indivisual exons are summarized in the following Table 6 
and are discussed in more detail hereinafter with respect 
to the procedure explained in R.K. Saiki et al. Science 
230:1350 (1985)* 

These oligonucleotides, as derived from the intron 
sequence, assist in amplifying by PCR the respective 
exon, thereby providing for analysis for DNA sequence 
alterations corresponding to mutations of the CF gene. 
The mutations can be revealed by either direct sequence 
determination of the PCR products or sequencing the 
products cloned in plasmid vectors. The amplified exon 
can also be analyzed by use of gel electrophoresis in the 
manner to be further described. It has been found that 
the sections of the intron for each respective exon are 
of sufficient length to work particularly well with PCR 
technique to provide for amplification of the relevant 
exon. 



TABLE 5 



Oligonucleotides used for amplification of CP gene exons by PCR 



Exon PCR primers; 5'-> 3' 



10 
11 
12 
13 



1 GGAGTTCACTCAOCTAAA (B1IS-B) 
ACACGCCCTCCTCTTTOGTG (10D) 

2 CCAAATCTGTATGG AGACCA (2i-5) 
TATGTTGCCCAGGCTGOTAT (2i-3) 

3 CTTOOOTTAATCrCCTTGGA (3i-5) 
AT7CACCAGATTTCGTAG7C C3i-3) 

4 TCACATATGGTATOACCCTC (41-5) 
TTOTACCAGCICACTACCTA (4i-3) 

5 ATTTCTOOCTAOATGCTGGG (Si-5) 
AACTOOOCCTTT0CAGTTGT (Si-3) 

6a TTAOTOTOCTCAOAAOCACO (6Ai-5) 
CTATOCATAOAGCACTCCTO (6Ai-3) 

6b TQOAATOAOTCTOTACAOCO (6Ci-5) 
C AGOTCC AAOTCTACCATGA (60-3) 

7 AO AOCATGCTCAOATCnCCAT (7i-5) 
OCAAAOTTCATTAOAACTOATC (Ti-3) 

8 TOAATCCTAOTGCTTOOCAA (8i-5) 
TOGCCATTAOOATQAAATCC (81-3) 

9 TAATOO ATCATCGOCCATOT (9i-5) 
ACAGIXnTOAATOTQOTOCA (K-3) 

OCAOAOTAOCTO AAACAOOA (10i-5) 
CATICACAiOTAOCTTAOOCA (10i-3) 
CAACTO TO OT TA AAOCAATAOTOT (1 11-5) 
OCACAOATTCTOAOTAACCATAAT (1 li-3) 
OTOAATOOATOTOOTOAOCA (121-5) 
CTOOTTTAOCATOAOOCOOT (121-3) 
(a) TOCTAAAATAOOAOACATA1TGCA (131-5) 

ATCTOGTACTAAGOACAG (C1-1M) 
<b) TCAAT0CAATCAACICTATAO0AA (X13B-5) 
TACAOCTTAT0CTAAT0CTATOAT (131-3A) 
14a AAAAOOTATQCCACTOTTAAO (14 Al-S) 
OTATACATOOOCAAACTATCT (14 Al-3) 
14b OAACAOCTAOTACAOCrOCr (14B1-5) 
AACTCC7XJOOCTCAAOTOAT (I4BI-3) 

15 OTOCATOCTCTTCTAAT3CA (151-5) 
AAGGCACATOCCTCTGTQCA (15i-3) 

16 CAOAOAAATTOOTOOTTACT (16i-5) 
ATCTAAATOTOOGATTOCCT (161-3) 

17a CAATOTOCACATOTACCCTA (17AI-5) 
TGTACAOCAACTQTGGTAAO (17Ai-3) 
17b TTCAAAOAATGGCACCAGTGT (17B1-5) 

ATAAOCTATAOAATGCAGCA (17B1-3) 
18 OTAOATOCTOTOATOAACra (181-5) 
" AOTOOCTATCTATGAQAAOa (181-3) 

19 GOOOQACAAATAAOCAAGTQA (19i-5) 
GCTAACACATTGCTTCAGGCT (19i-3) 

20 GGTCAGGATTQAAAOTGTOCA (20i-5) 
CTATOAOAAAACrGCACTGOA (20i-3) 

21 AATOTTCACAAOOOACTCCA (21i-5) 
CAAAAOTAOCTGTTOCTXXA (211-3) 

22 AAAOGCTOAGCCTGACAAOA (221-5) 
TGTCACCATOAAOCAGGCAT (221-3) 

23 AGCTOATTOTOCOTAACGCT (231-5) 
TAAAOCTGGATGOCTOTATG (231-3) 

24 GGACACAGCAGTTAAATCTQ (241-5) 
ACTATTOOCAGG AAGOCATT (241-3) 



Amplified product (bp) 
933 
378 
309 
438 
395 
385 
417 ~ 
410 
359 
560 
491 
425 
426 
S28 
497 
511 
449 
485 
570 
579 
463 
451 
454 
473 
477 
S62 
400 
569 



W91/10734 PCT/CA9 1/00009 



67 

TABLE 6 



Thermal cycle 



Exon 


Buffer* 


Initial 
denaturation 

tXBQC/VCXZSp 


Dcnaturation Annealing 
time/temp timeAcmp 


Extention 
tuDeAcxnp 


Finil 
extcatk>G 
tuncActnp 


3-5 .6a ,6b. 
7-10. 12. 
14a, 16. 17b. 
18-24 


MIS) 


6nrin/MC 


30sec/94C 


30 tec/55 C 


lmin/72C 


7min/72C 


1 


B 


6min/94Q 


30aec/94C 


30 sec/S5 C 


i5mia/72C. 7miiV72C 


2,11 


B 


6mio/94C 


30 tec/94 C 


30sec/52C 


1 mio/72C 


7min/72C 


13a 


A(1.75) 


6min/94C 


30c«c/94C 


30 tec/54 C 


23min/72C 


7miiV72C 


13b 


A(1.75) 


6min/94C 


30cec/94C 


30sec/52C 


2^min/72C 


7nrin/72C 


14b 


B 


6oiQ/94C 


30 sec/94 C 


30 ace/56 C 


lmln/72C 


7trdaV72C 


17a 


MIS) 


6ndof94C 


30 sec/94 C 30 tec/56 C 


lmin/72C 


7tnw/72C 



(a) Buffer A<1.5): fcufiferwUhLSniMMfaa 
Buffer A(1.75): ^ fcaffcr with 1.75mM Mjd 2 

Buffer B: 67nMliis-HapHeA ^mMMgdj, 16.6 aM (NHOjSO^ 0.67oM EDTA, 
lOmM B-me reapto c th anol. 170 ug/ml BS A, 10* DMSO. UmMof eachdNTPt 

dNTPs -deoxynuclcotidc triphosphate!) 
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3.7 CF MUTATION8 - AIS06 OR AIS07 

The association of the F508 deletion with 1 common 
and 1 rare CF haplotype provided further insight into the 
.number of mutational events that could contribute to the 
5 present patient population. Based on the extensive 

haplotype data, the original chromosome in which the F508 
deletion occurred is likely to carry the haplotype - 
AAAAAAA- (Group la) , as defined in Table 4. The other 
Group I CF chromosomes carrying the deletion are probably 

10 recombination products derived from the original 

chromosome. If the CF chromosomes in each haplotype 
group are considered to be derived from the same origin, 
only 3-4 additional mutational events would be predicted 
(see Table 4) . However, since many of the CF chromosomes 

15 in the same group are markedly different from each other, 
further subdivision within each group is possible* As a 
result, a higher number of independent mutational events 
could be considered and the data suggest that at least 7 
additional, putative mutations also contribute to the CF- 

20 PI phenotype (see Table 3). The mutations leading to the 
CF-PS subgroup are probably more heterogeneous. 

The 7 additional CF-PI mutations are represented by 
the haplotypes: -CAAAAAA- (Group lb) , -CABCAAD- (Group 
Ic) , BBBAC- (Group Ila) , -CABBBAB- (Group Va) . 

25 Although the molecular defect in each of these mutations 
has yet to be defined, it is clear that none of these 
mutations severely affect the region corresponding to the 
oligonucleotide binding sites used in the 
PCR/hybridization experiment. 

30 One CF chromosome hydridizing to the AF508-ASO 

probe, however, has been found to associate with a 
different haplotype (group Ilia) . It appeared that the 
AF508 should have occurred in both haplotypes, but with 
the discovery of AI507, it is discovered that it is not. 

35 Instead, the AF508 is in group la, whereas the AI507 is 
in group Ilia. None of the other CF nor the normal 
chromosomes of this haplotype group (Ilia) have shown 
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hybridization to the mutant (AF508) ASO [B. Kerem et al, 
Science 245:1073 (1989)]. In view of the group la and 
Ilia haplotypes being distinctly different from each 
other, the mutations harbored by these two groups of CF 
5 chromosomes must have originated independently. To 

investigate the molecular nature of the mutation in this 
group Ilia CF chromosome, we further characterized the 
region of interest through amplification of the genomic 
DNA from an individual carrying the chromosome Ilia by 

10 the polymerase chain reaction (PGR) . 

These polymerase chains reactions (PGR) were 
performed according to* the procedure of R.K. Saikiet al 
Science 230:1350 (1985). A specific DNA segment of 491 
bp including exon 10 of the CF gene was amplified with 

15 the use of the oligonucleotide primers 10i-5 (5'- 
GCAGAGTACCTGAAACAGGA-3 ' ) and 10i-3 

( 5 ' CATTCACAGTAGCTTACCCA-3 ' ) located in the 5' and 3' 
flanking regions, respectively, as shown in Figure 18 and 
itemzied in Table 5. Both oligonucleotides were 

20 purchased from the HSC DNA Biotechnology Service Center 
(Toronto) . Approximately 500 ng of genomic DNA from 
cultured lymphoblastoid cell lines of the parents and the 
CF child of Family 5 were used in each reaction. The DNA 
samples were denatured at 94 # C for 30 sec, primers 

25 annealed at 55°C for 30 sec, and extended at 72°C for 50 
sec. (with 0.5 unit of Taq polymerase, Perkin- 
Elmer/Cetus, Norwalk, CT) for 30 cycles and a final 
extension period of 7 min. in a Perkin-Elmer/Cetus DNA 
Thermal Cycler. Reaction conditions for PCR 

30 amplification of other exons are set out in Table 6. 

Hydridization analysis of the PCR products from 
three individuals of Family 5 of group Ilia was 
performed. The carrier mother -nd father are represented 
by a half -filled circle and square, respectively, and the 

35 affected son is a filled square in Figure 19a. The 
conditions for hybridizaton an^i washing have been 
previously described (Kerem et al, supra ) « There is a 
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relatively weak signal in the father's PCR product with 
the mutant (oligo AF508) probe. In Figure 19b, DNA 
sequence analysis of the clone 5-3-15 and the PCR 
products from the affected son and the carrier father are 
5 shown. The arrow in the center panel indicates the 

presence of both A and T nucleotide residue in the same 
position; the arrow in the right panel Indicates the 
points of divergence between the normal and the AI507 
sequence. The sequence ladders shown are derived from 

10 the reverse-complements as will be described later. 
Figure 19c shows the DNA sequences and their 
corresponding amino acid sequences of the normal, AI507, 
and AF508 alleles spanning the mutation sites are shown. 
With reference to Figure 19a , the PCR-amplif ied DNA from 

15 the carrier father, who contributed the group Ilia CF 
chromosome to the affected son, hybridized less 
efficiently with the AF508 ASO than that from the mother 
who carried the group la CF chromosome. The difference 
became apparent when the hybridization signals were 

20 compared to that with the normal ASO probe. This result 
therefore indicated that the mutation carried by the 
group Ilia CF chromosome might not be identical to AF508. 

To define the nucleotide sequence corresponding to 
the mutant allele on this chromosome, the PCR-amplif ied 

25 product of the father's DNA was excised from a 

polyacrylamide-electrophoretic gel and cloned into a 
sequencing vector. 

The general procedures for DNA isolation and 
purification for purposes of cloning into a sequencing 

30 vector are described in J. Sambrook, E.F. Fritsch, T. 

Maniatis, Molecular Cloning: A Labora tory Manual . 2nd ed. 
(Cold Spring Harbor Press, N. Y. 1989). The two 
homoduplexes generated by PCR amplification of the 
paternal DNA were purified from a 5% non-denaturing 

35 polyacrylamide gel (30:1 acrylamide:bis-acrylamide) . The 
appropriate bands were visualized by staining with 
ethidium bromide, excised and eluted in TE (10 mM Tris- 
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HC1; lmM ; EDTA; pH 7.5) for 2 to 12 hours at room 
temperature. The DNA solution was sequentially treated 
'with Tris-equilibrated phenol, phenol/CHCl, and CHC1 3 . 
The DNA samples were concentrated by precipitation in 
5 ethanol and resuspension in TE, incubated with T4 

polynucleotide kinase in the presence of ATP, and ligated 
into diphosphorylated, blunt-ended Bluescript KS~ vector 
(Stratagene, San Diego, CA) . Clones containing amplified 
product generated from the normal parental chromosome 
10 were identified by hybridization with the oligonucleotide 
N as described in Kerem et al supra . 

Clones containing the mutant sequence were 
identified by their failure to hybridize to the normal 
ASO (Kerem et al, fiupxa) . One clone, 5-3-15 was isolated 
15 and its DNA sequence determined. The general protocol 
for sequencing cloned DNA is essentially as described 
[J.R. Riordan et al, Science 245:1066 (1989)] with the 
use of an U.S. Biochemicals Sequenase*" kit. To verify the 
sequence and to exclude any errors introduced by DNA 
20 polymerase during PCR, the DNA sequences for the PCR 

products from the father and one of the affected children 
were also determined directly without cloning. 

This procedure was accomplished by denaturing 2 
pmoles of gel-purified double-stranded PCR product in 0.2 
25 M NaOH/0.2 mM EDTA (5 min. at room temperature), 

neutralized by adding 0.1 volume of 2 M ammonium acetate 
(pH 5.4) and precipitated with 2.5 volumes of ethanol 
at -70«C for 10 min. After washing with 70* ethanol, the 
DNA pellet was dried and redissolved in a sequencing 
30 reaction buff er -containing 4 pmoles of the 

oligonucleotide primer 10i-3 of Figure 18, dithiothreitol 
(8.3 mM) and [a-35S]-dATP (0.8 /iM, 1000 Ci/mmole) . The 
mixture was incubated at 37 -C for 20 min., following 
which 2 jil of labelling mix, as included in the 
35 Sequenase" Kit and then 2 units of Sequenase enzyme were 
added. Aliquotes of the reaction mixture (3.5 nl) were 
transferred, without delay, to tubes each containing 2.5 
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Ail of ddGTP, ddATP, ddTTP and ddCTP solutions (U.S. 
Biochemicals Sequenase kit) and the reactions were 
stopped by addition of the stop solution. 

The DNA sequence for this mutant allele is shown in 
5 Figure 19b. The data derived from the cloned DNA and 
direct sequencing of the PGR products of the affected 
child and the father are all consistent with a 3 bp 
deletion when compared to the normal sequence (Figure 
19c) . The deletion of this 3 bp (ATC) at the 1506 or 

10 1507 position results in the loss of an isoleucine 
residue from the putative CFTR, within the same ATP- 
binding domain where AF508 resides, but it is not evident 
whether this deleted amino acid corresponds to the 
position 506 or 507/ Since the 506 and 507 positions are 

15 repeats, it is at present impossible to determine in 

which position the 3 bp deletion occurs. For convenience 
in later discussions, however, we refer to this deletion 
as AI507. 

The fact that the AI507 and AF508 mutations occur in 

20 the same region of the presumptive ATP-binding domain of 
CFTR is surprising. Although the entire sequence of 
AI507 allele has not been examined, as has been done for 
AF508, the strategic location of the deletion argues that 
it is the responsible mutation for this allele. This 

25 argument is further supported by the observation that 
this alteration was not detected in any of the normal 
chromosomes studied to date (Kerem et al, supra ) . The 
identification of a second single amino acid deletion in 
the ATP-binding domain of CFTR also provides information 

30 about the structure and function of this protein. Since 
deletion of either the phenylalanine residue at position 
508 or isoleucine at position AI507 is sufficient to 
affect the function of CFTR such that it causes CF 
disease, it is suggested that these residues are involved 

35 in the folding of the protein but riot directly in the 
binding of ATP. That is, the length of the peptide is 
probably more important than the actual amino acid 
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residues in this region. In support of this hypothesis, 
it has been found that the phenylalanine residue can be 
replaced by a serine and that isoleucine at position 506 
with valine,- without apparent loss of function of CFTR. 
5 When the nucleotide sequence of AI507 is compared to 

that of AFS08 at the ASO-hybridizing region, it was noted 
that the difference between the two alleles was only an A 
-> T change (Figure 19c) . This subtle difference thus 
explained the cross-hybridization of the AF508-ASO to 

10 AI507. These results therefore exemplified the 
importance of careful examination of both parental 
chromosomes in performing ASO-based genetic diagnosis. 
It has been determined that the AF508 and AI507 mutations 
can be distinguished by increasing the stringency of 

15 oligonucleotide hybridization condition or by detecting 
the unique mobility of the heteroduplexes formed between 
each of these sequences and the normal DNA on a 
polyacrylaminde gel. The stringency of hybridization can 
be increased by using a washing temperature at 45 # C 

20 instead of the prior 39 # C in the presence of 2XSSC (1XSSC 
= 150 mM NaCL and 15 mM Na citrate) . 

Identification of the AI507 and AF508 alleles by 
polyacrylamide gel electrophoresis is shown in Figure 20. 
The PCR products were prepared from the three family 

25 members and separated on a 5% polyacrylamide gel as 

described above. A DNA sample from a known heterozygous 
AF508 carrier is included for comparison. With reference 
to Figure 20 , the banding pattern of the PCR-amplif ied 
genomic DNA from the father, who is the carrier of AI507, 

30 is clearly distinguishable from that of the mother, who 
is of the type of carriers with the AF508 mutation. In 
this gel electrophoresis test, there were actually three 
individuals (the carrier father and the two affected sons 
in Family 5) who carried the AI507 deletion. Since Uiey 

35 all belong to the same family, they only represent one 

single CF chromosome in our population analysis [Kerem et 
al, supra ] The two patients who also inherited the Ar508 
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mutation from their mother shoved typical symptoms of cf 
with pancreatic insufficiency. The father of this family 
was the only parent who carries this AI507 mutation; no 
other CF parents showed reduced hybridization intensity 
5 signal with the AF508 mutant oligonucleotide probe or a 
peculiar heteroduplex pattern for the PCR product (as 
defined above) in the retrospective study. In addition, 
two representatives of the group Illb and one of the 
group II Ic CF chromosomes from our collection [Kerem et 

10 al, suora 1 were sequenced, but none were found to contain 
AI507. Since the electrophoresis technique eliminates 
the need for probe-labelling and hybridization, it may 
prove to be the method of choice for detecting carriers 
in a large population scale [J. M. Rommens et al, Am. J. 

15 Hum. Genet , 46:395-396 (1990)]. 

The present data also indicate that there is a 
strict correlation between DNA marker haplotype and 
mutation in CF. The AF508 deletion is the most common CF 
mutation that occurred on a group la chromosome 

20 background (Kerem et al, supra T . The AI507 mutation is, 
however, rare in the CF population; the one group Ilia CF 
chromosome carrying this deletion is the only example in 
our studied population (1/219) • Since the group III 
haplotype is relatively common among the normal 

25 chromosomes (17/198) , the AI507 deletion probably 
occurred recently. Additional studies with larger 
populations of different geographic and ethnic 
backgrounds should provide further insight in 
understanding the origins of these mutations. 

30 2jl1 ADDITIONAL CF MUTATIONS 

Following the above procedures, other mutations in 
the CF gene have been identified. The following brief 
description of each identified mutation is based on the 
previously described procedures for locating the mutation 

35 involving use of PCR procedures. The mutations are given 
short form names. The numbering used in these 
abbreviations refers to either the DNA sequence or the 
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10 



15 



amino acid. sequence position off the mutation depending on 
the type of mutation. For example, splice mutations and 
frameshift mutations are defined using the DNA sequence 
position. Most other mutations derive their nomenclature 
from the amino acid residue position. The description of 
each mutation clarifies the nomenclature in any event. 

For example, mutations G542X, Q493X, 3659 del C, 556 
del A result in shortened polypeptides significantly 
different from the single amino acid deletions or 
alteration. G542X and Q493X involve a polypeptide 
including on the first 541 and 493 amino acid residues, 
respectively, of the normal i486 amino acid polypeptide. 
3659 del C and 556 del A* also involve shortened versions 
and will include additional amino acid residues. 
Mutation 711+1G - T and 1717-1G - A are predicted to lead 
to polypeptides which cannot be as of yet exactly 
defined. They probably do lead to shortened polypeptides 
but could contain additional amino acids. DNA sequence 
encoding these mutant polypeptides will now probely 
20 contain Intrcn sequence from the normal gene or possible 
deleted exons. 

3 '9t9 MUTATIONS IK E»OK 1 

In the 129G - c mutation, there is a single basepair 
change of G to C at nucleotide 129 of the cDNA sequence 

25 of Figure l. The PCR product for amplifying genomic DNA 
containing this mutation is derived from the B115-B and 
10D primers as set out in Table 5. The genomic DNA is 
amplified as per the conditions of Table 6. 
3 ' 9 1 1 MUTATIONS in mow ? 

30 The G85E mutation in exon 3 involves a G to A 

transition at nucleotide position 386. it is detected in 
family /26, a French Canadian family classified as pi. 
This predicted Gly to Glu amino acid change is associated 
with a group lib haplotype. The mutation destroys a 

35 Hinfl site. The PCR product derived from the 3i-5 and 

3i-3 primers, as per conditions of Table 6, is cleaved by 
this enzyme into 3 fragment, 172, 105 and 32 bp, 
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respectively, for the normal sequence; a fragment of 277 
bp would be present for the mutant sequence. We analyzed 
54 CF chromosomes, 8 from group II, and 50 normal 
chromosomes, 44 from group II, and did not find another 
5 example of G85E. 

?t*t? MUTATIONS IN EXON 4 

556 del A is a frameshift mutation in exon 4 in a 
single CF chromosome (Toronto family #17, GH1076) . There 
is a deletion of A at nucleotide position 556. This 

10 mutation is associated with Group Illb haplotype and is 
not found in 31 other CF chromosomes (9 from Illb) and 30 
N chromosomes (16 from Illb), The muation creates a Bgll 
1 enzyme cleavage site. The PCR primers are 4i-5 and 4i- 
3 (see Table 5) where the enzyme cuts the mutant PCR 

15 product (437 bp) into 2 fragments of 287 and 150 bp in 
size. 

The 114 8T mutation in exon 4 involves a T to C 
basepair transition at nucleotide position 575. This 
results in an lie to Thr change at amino acid position 
20 148 of Figure 1. The PCR product used in amplifying 
genomic DNA containing this mutation uses primers 4i-5 
and 4i-3 as set out in Table 5. The reaction conditions 
for amplyfing the genomic DNA are set out in Table 6. 

?t9.3 MUTATIONS IN CTOK S 

25 in mutation G178R the Gly to Arg missense mutation 

in exon 5 is due to a G to A change at nucleotide 
position 664. The mutation is found on the mother's CF 
chromosome In family #50; the other mutation in this 
family is AF508. Primers 5i-5 and 5i-3 were used for 

30 amplifying genomic DNA as outlined in Tables 5 and 6. 

3 f B.4 MUTATIONS IN BXQN 9 

A mutation in exon 9 is a change of alanine (GCG) to 
glutamic acid (GAG) at amino acid position 455 
(A455 E) . Two of the 38 non-AF508 CF chromosomes ' 
35 examined carries this mutation; both of them are from 
patients of a French-Canadian origin, which we have 
identified in our work as families #27 and /53, and they 
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belong to haplotype group lb. The nutation is detectable 
by allele-specific oligonucleotide (ASO) hybridization 
with PCR-amplif ied genomic DNA sequence. The PCR primers 
are 91-5 (5'— TAATGGATCATGGGCCATGT-3 9 ) and 9i-3 (5'- 
5 ACAGTGTTGAATGTGGTGCA-3 9 ) for amplifying genomic DNA under 
the conditions of Table 6. The ASOs are 5 9 - 
GTTGTTGGCGGTTGCT-3' for the normal allele and 5'- 
GTTGTTGGAGGTTGCT-3 ' for the mutant. The oligonucleotide 
hybridizattion is as described in Kerem et al (1989) supra 

10 at 37 f C and the washings are done twice with 5XSSC for 10 
min each at room temperature followed by twice with 2 X 
SSC for 30 min each at_52°C. Although the alanine _at 
position 455 (Ala455) is not present in all ATP-binding 
folds across species, it is present in all known members 

15 of the P-glycoprotein family, the protein most similar to 
CFTR. Further, A455 E is believed to be a mutation 
rather than a sequence polymorphism because the change is 
not found in 16 non-AF508 CF chromosomes and three normal 
chromosomes carrying the same group I haplotype* 

20 3r9t? MUTATIONS TK «ON 10 

In the Q493X mutation Gln493 (CAG) is changed into a 
stop codon (TAG) in Toronto family /9 (nucleotide 
position 1609 C T) . The muation occurs on a CF 
chromosome with haplotype Illb; it is not found in 28 

25 normal chromosomes (15 of which belong to lib) nor in 3 3 
other CF chromosomes (5 of which Illb) . The mutation can 
be detected by allele-specific PCR, with 10i-5 as the 
common PCR primer, 5 ' -GG CAT AATC CAGG AAAACTG - 3 9 for the 
normal sequence and 5 ' -GGCATAATCCAGGAAAACTA-3 9 for the 

30 mutant allele.- The PCR condition is 6 min at 94 • 

followed by cycles of 30 sec at 94° , 30 sec at 57° and 90 
sec at 72°, with 100 ng of each primer and -400 ng 
genomic DNA. The primers 9i-3 and 9i-5 may be used for 
internal PCR control as they share the same reaction 

35 condition. 

3*8.6 MUTATIONS IN EXON 11 
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In mutation G542X the glycine codon (GGA) at amino 
acid position 542 is changed to a stop codon (TGA) (G542 
Stop) . The single chromosome carrying this mutation is 
* of Ashkenazic Jewish origin (family A) and has the B 
5 haplotype (XV2C allele 1; KM, 19 allele 2)- The mutant 
sequence can be detected by hybridization analysis with 
allele-specific oligonucleotides (ASOs) on genomic DNA 
amplified under conditions of Table 6 by PCR with the 
lli-5 and lli-3 oligonucleotide primers. The normal ASO 
10 is 5 ' -ACCTTCTCCAAGAACT-3 ' and the mutant ASO, 5'- 

ACCTTCTCAAAGAACT-3 ' • The oligonucleotide hybridization 
condition is as described in Kerem et al (1989) gqprfl and 
the washing conditions are twice in 5 x SSC for 10 min. 
each at room temperature followed by twice in 2 X SSC for 
15 30 min. each at 45°C. The mutation is not detected in 52 
other non-AF508 CF chromosomes, 11 of which are of Jewish 
origin (three have a B haplotype) , nor in 13 normal 
chromosomes • 

In mutation S549R, the highly conserved serine 
20 residue of the nucleotide binding domain at position 549 
is changed to arginine (S549 - R) ; the codon change is 
AGT AGG. The CF chromosome with this mutation is 
carried by a non-Ashkenazic Jewish pateitn from Morocco 
(family B) . The chromosome also has the B haplotype. 
25 Detection of this mutation may be achieved by ASO 
hybridization or allele-specific PCR. In the ASO 
hybridization procedure, the genomic DNA sequence is 
first amplified under conditions of Table 6 by PCR with 
the lli-5 and lli-3 oligonucleotides; the ASO for the 
30 normal sequence is 5 ' -ACACTGAGTGGAGGTC-3 ' and that for 
the mutant is 5 9 - ACACTGAGGGGAGGTC . The oligonucleotide 
hybridization condition is as described by Kerem et al 
(1989) supra and the washings are done twice in 5 x SSC 
for 10 min. each at room temperature followed by twice in 
35 2 x SSC for 30 min. eachat 56°C. In the allele-specific 
PCR amplification, the oligonucleotide primer for the 
normal sequence is 5 9 TGCTCGTTGACCTCCA-3 ' , that for the 
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mutant is 5'TGCTCGTTGACCTCCC-3 ' and that for the common, 
outside sequence is lli-5. The reaction is performed 
with 500 ng of genomic DNA, 100 ng of each of the 
oligonucleotides and 0.5 unit of Tag polymerase. The DNA 
5 template is first denatured by heating at 94 «c for 6 

min., followed by 30 cycles of 94 e for 30 sec, 55« for 30 
sec and 72 • for 60 sec. The reaction is completed by a 6 
min heating at 72 • for 7 min. This S549 -* R mutation is 
not present in 52 other non-AF508 CF chromosomes, 11 of 
10 which are of Jewish origin (three have a B haplotype) , 
nor in 13 normal chromosomes. 

In the S549I mutation there is an AGT-ATT change 
(nucleotide position 1778 G->T) which represent the third 
mutation involving this amino acid codon resulting in a 
15 loss of the Ddel site. He have only one example who is 
of Arabic origin and is sequenced; no other Ddel- 
resistant chromosome is found in 5 other Arabic CF, 21 
Jewish CF, 41 Canadian CF, and 13 Canadian normal 
chromosomes . 

20 m mutation R560T the arginine (AAG) at amino acid 

position 560 is changed to threonine (AAC) . The 
individual carrying this mutation (R560 - T) is from a 
family we have identified in our work as family #32 and 
the chromosome is marked by haplotype nib. The mutation 

25 creates a Maell site which cleaves the PCR product of 
exon 11 (generated with primers lli-5 and lli-3 under 
conditions of Table 6) into two fragments of 214 and 204 
bp in size. None of the 36 non-AF508 CF chromosomes 
(seven of which have haplotype Illb) or 23 normal 

10 chromosomes (16 have haplotype Illb) carried this 

sequence alteration. The R560 T mutation is also not 
present on eight CF chromosomes with the AF508 mutation. 

In mutation G551D glycine (G) at amino acid position 
551 is changed to aspartic acid (D) . G551 is a highly 

15 conserved residue within the ATP-binding fold. The 
corresponding codon change is from GGT to GAT. The 
G551-D change is found in 2 of our families /38) 



80 

with pancreatic insufficient (PI) CF patients and 1 
family (/54) with a pancreatic sufficient (PS) patient* 
The other CF chromosomes in family /l and /38 carry the 
AF508 mutation and that in family /54 is unknown. Based 
on bur "severe and mild mutation" hypothesis (Kerem et 
al. 1989) , this mutation is expected to be a "severe" 
one. All 3 chromosomes carrying this mutation belong to 
Group Illb. This G551->D substitution does not represent 
a sequence polymorphism because the change is not 
detected in 35 other CF chromosomes without the AF508 
deletion (5 of them from group Illb) and 19 normal 
chromosomes (including 5 from group Illb). To detect 
this mutation, the genomic DNA region may be amplified 
under conditions of Table 6 by PCR with primers lli-5 
( 5 ' -CAACTGTGGTTAAAGCAATAGTGT-3 9 ) and lli-3 (5'- 
GCACAGATTCTGAGTAACCATAAT-3 ' ) and examined for the 
presence of a Mbol (Sau3A) site created by nucleotide 
change; the uncut (normal) form is 419 bp in length and 
the digestion products (from the mutant form) are 241 and 
178 bp. 

3>9»7 MUTATIONS IK EXON 12 

In the Y563N mutation a T to A change is detected at 
nucleotide position 1820 in exon 12. This switch would 
result in a change from Tyr to Asn at amino acid position 
563. It is found in a single family with 2 PS patients 
but the mutation in the other chromosome is unknown. We 
think Y563N is probably a missense mutation because (1) 
the T to A change is not found in 59 other CF 
chromosomes, with 8 having the same haplotype (Ila) and 
30 having AF508; and (2) this alteration is not found in 
54 normal chromosomes, with 39 having the lla haploytype. 
Unfortunately, the amino acid change i6 not drastic 
enough to permit a strong argument. This putative 
mutation can be detected by ASO hybridization with a 
normal (5 ' -AGCAGTATACAAAGATGC-3 ' ) and a mutant (5'- 
AGCAGTAAACAAAGATGC-3') oligonucleotide probe. The 
washing condition is 54 °C with 2xSSC. 
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In the P574H mutation the C at nucleotide position 
1853 is changed to A. Although the amino acid Pro at 
this position is not highly conserved across different 
ATP-binding folds, c change to His could be a drastic 
5 substitution. This change is not detected in 52 other CF 
chromosomes nor 15 normal chromosomes, 4 of which have 
the same group IV haplotype. Based on these arguments, 
we believe P574H is a mutation. To detect this putative 
mutation, one may use the following ASOs: 5'- 
10 GACTCTCCTTTTGGA-3 ' for the normal and 5 ' -GACTCTCATTTTGGA- 
3' for the mutant. Washing should be done at 47 # in 
2XSSC. 

In the L1077P mutation, the T at nucleotide position 
33 62 is changed to C. Thi6 results in a change of the 

15 amino acid Leu to Pro at amino position 1077 in Figure l. 
As with the other mutations in this exon, the genomic DNA 
is amplified by use of the primers of Table 5; namely 
17bi-5 and 17bi-3. The reaction conditions in amplifying 
the genomic DNA are set out in Table 6. 

20 The Y1092X mutation involves a change of C at 

nucleotide position 3408 to A. This would result in 
protein synthesis termination at amino position 1092. 
Hence the amino acid Tyr is not present in the truncated 
polypeptide. As with the above procedures, the primers 

25 used in amplifying this mutation are 17bi-3 and 17bi-3. 
?i?t9 MTOTIQNS iy ESW 19 

3659 del C is a frame6hift mutation in exon 19 in a 
single CF chromosome (Toronto family #2); deletion of C 
at nucleotide position 3659 or 3960; haplotype Ila; not 

30 present in 57~non-AF508 CF chromosomes (7 from Ila) and 
50 N chromosomes (43 from Ila) ; the deletion may be 
detected by PCR with a common oligonucleotide primer 19 i- 
5 (see Table 5) and 2 ASO primers, HSC8 (5'- 
GTATGGTTTGGTTGACTT GG-3') for the normal and HSC9 (5'- 

35 GTATGGTTTGGTTGACTTGT-3') for the mutant allele; the PCR 
condition is as usual except the annealing temperature is 
at 60°C to improve specificity. 
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3,8,9 MUTATIONS IN INTRON 4 

In the 621 + 1G -» T mutation there is a single bp 
change affecting the splice site (GT — TT) at the 3' end 
of exon 4; this mutation is detected in 5 French-Canadian 
5 CF chromosomes (one each in Toronto families /22, 23, 26, 
36 and 53) but not in 33 other CF chromosomes (18 from 
the same group, group I) and 29 N chromosomes (13 from 
group I); the mutation creates a Msel site; genomic DNA 
may be amplified by the 2 intron primers, 4i-5 adn 4i-3, 

10 and cut with Msel to distinguish the normal and mutant 
alleles; the normal would give 4 fragments of 33, 35, 71 
and 298 bp in size; the 298 bp fragment in the mutant is 
cleaved by the enzyme to give a 54 and 244 bp fragments. 
?t9ilO MUTATIONS IN INTRON S 

15 In the 711 + 1G T mutation this G to T switch 

occurs at the splice junction after exon 5. The mutation 
is found on the mother's CF chromosome in family /22, a 
French Canadian family from Chicoutimi. The other 
mutation in this family is 621+1G - T. 

20 ? t g f U MUTATIONS IN INTRON 10 

In the 1717-1G ~> A mutation a putative splice 
mutation is found in front of exon 11. This mutation is 
located at the last nucleotide of the intron before exon 
11. The mutation may be detected with the following 

25 ASO's: normal - 5'-TTTGGTAATAGGACATCTCC-3 ' ; mutant ASO *= 
5 9 -TTTGGTAATAAGACATCTCC-3 ' . The washing conditions afer 
hybridization are SxSSC twice for 10 min at room temp, 
2XSSC twice for 30 min at 47° for the mutant and 2xSSC 
twice to 30 min at 48° for the normal ASO. We have only 

30 l single example from an Arabic patient and there is no 
haploytpe data. The mutation is not found in 5 other 
Arabic, 21 Jewish, and 41 Canadian CF chromosomes, nor in 
13 normal chromosomes. 
1^2. PNA SEQUE NCE POLYMORPHISMS 

3 5 Kucleotide position Amino acid change 



1540 (A or G) 
1716 (G or A) 
2C94 (T or G) 



Met or Val 

no change (Glu) 

no change (Thr) 



ll 
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356 (G or A) Argor Gin 

A polymorphism is detected at nucleotide position 154 0- 
~ the A residue can be substituted by G , changing the 
corresponding amino acid from Met to Val. At postion 
5 2694- the T residue can be a G; although it does not 
change the encoded amino acid. The polymorphism may be 
detected by restriction enzymes Avail or Sau9GI. These 
changes are present in the normal population and show 
good correlation with haploytpes but not in CF disease. 
10 There can be a G to A change for the last nucleotide 

of exon 10 (nucleotide position 1716) . We think that 
this nucleotide substitution is a sequence polymorphism 
because (a) it does not alter the amino acid, (b) it" is 
unlikely to cause a splicing defect and (c) it occurs on 

15 some normal chromosomes. In two Canadian families, this 
rare allele is found associated with haplotype Illb. 

The more common mucleotide at 356 (G) is found to be 
changed to A in the father's normal chromosome in family 
#54. The amino acid changes from Arg to Gin. 

20 4^0 CFTR PROTEIN 

As discussed with respect to the DNA sequence of 
Figure 1, analysis of the sequence of the overlapping 
cDNA clones predicted an unprocessed polypeptide of 1480 
amino acids with a molecular mass of 168,138 daltons. As 

25 later described, due to polymorphisms in the protein, the 
molecular weight of the protein can vary due to possible 
substitutions or deletion of certain amino acids. The 
molecular weight will also change due to the addition of 
carbohydrate units to form a glycoprotein. It is also 

30 understood that_the functional protein in the cell will 
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be similar to the unprocessed polypeptide, but nay be 
modified due to cell metabolism. 

Accordingly, purified normal CFTR polypeptide is 
characterized by a molecular weight of about 170 000 
5 daltons and having epithelial cell transmembrane ' ion 

conductance activity. The normal CFTR polypeptide, which 
is substantially free of other human proteins, is encoded 
by the aforementioned DMA sequences and according to one 
embodiment, that of Figure 1. Such polypeptide displays 
10 the immunological or biological activity of normal CFTR 
polypeptide. As will be later discussed, the CFTR 
polypeptide and fragments thereof may be made by chemical 
or enzymatic peptide synthesis or expressed in an 
appropriate cultured cell system. The invention provides 
15 purified 507 mutant CFTR polypeptide which is 

characterized by cystic fibrosis-associated activity in 
human epithelial cells. Such 507 mutant CFTR 
polypeptide, as substantially free of other human 
proteins, can be encoded by the 507 mutant DMA sequence 

20 ±a gTRycTTOB <?r_gm 

The most characteristic feature of the predicted 
protein is the presence of two repeated motifs, each of 
which consists of a set of amino acid residues capable of 
spanning the membrane several times followed by sequence 
resembling consensus nucleotide (ATP) -binding folds 
(NBFs) (Figures 11, 12 and 15). These characteristics 
are remarkably similar to those of the mammalian 
multidrug resistant P-glycoprotein and a number of other 
membrane-associated proteins, thus implying that the 
30 Predicted .CF gene product is likely to be involved in the 
transport of substances (ions, across the membrane and is 
probably a" member of a membrane protein super family 

Figure 13 is a schematic model of the predicted CFTR 
protein. In Figure 13, cylinders indicate membrane 
35 spanning helices, hatched spheres indicate NBFs. The 
stippled sphere is the polar R-domain. The 6 membrane 
spanning helices in each half of the molecule are 
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depicted as cylinders. The inner cytoplasmically 
oriented NBFs are shown as hatched spheres with slots to 
indicate the means of entry by the nucleotide. The large 
polar R-domain which links the two halves is represented 
by an stippled sphere. Charged individual amino acids 
within the transmembrane segments and on the R-domain 
surface are depicted as small circles containing the 
charge sign. Net charges on the internal and external 
loops joining the membrane cylinders and on regions of 
the NBFs are contained in open squares, sites for 
phosphorylation by protein kinases A or C are shown by 
closed and open triangles respectively. K,R,H,D, and E 
are standard nomenclature for the amino acids, lysine, 
arginine, histidine, "aspartic acid and glutamic acid 
respectively. 

Each of the predicted membrane-associated regions of 
the CFTR protein consists of 6 highly hydrophobic 
segments capable of spanning a lipid bilayer according to 
the algorithms of Kyte and Doolittle and of Gamier et al 
( J. Mol. Biol, 120, 97 (1978) (Figure 13). The membrane- 
associated regions are each followed by a large 
hydrophilic region containing the NBFs. Based on 
sequence alignment with other known nucleotide binding 
proteins, each of the putative NBFs in CFTR comprises at 
least 150 residues (Figure 13) . The 3 bp deletion at 
position 507 as detected in CF patients is located 
between the 2 most highly conserved segments of the first 
NBF in CFTR. The amino acid sequence identity between 
the region surrounding the isoleucine deletion and the 
corresponding regions of a number of other proteins 
suggests that this region is of functional importance 
(Figure 15)." A hydrophobic amino acid, usually one with 
an aromatic side chain, is present in most of these 
proteins at the position corresponding to 1507 of the 
CFTR protein. It is understood that amino acid 
polymorphisms may exist as a result of DNA polymorphisms. 
Similarly, mutations at the other positions in the 
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additional mutations include substitutions of: 

i) Glu for Gly at amino acid position 85; 

ii) Thr for lie at amino acid position 148; 

iii) Arg for Gly at amino acid position 178; 

iv) Glu for ALA at amino position 455; 

v) stop codon for Gin at amino acid postion 493; 

vi) stop codon for Gly at amino acid position 542; 

vii) Arg for Ser or lie for Ser at amino acid 
position 549; 

viii) Asp for Gly at amino acid position 551; 

ix) Thr for Arg at amino acid position 560; 

x) Asn for Tyr at amino acid position 563; 

xi) His for Pro at amino acid position 574; 

xii) Pro for Leu at amino acid position 1077; 

xiii) Stop codon for Tyr at amino acid position 
1092, 

Figure 15 shows alignment of the 3 most conserved 
segments of the extended NBF's of CFTR with comparable 
regions of other proteins. These 3 segments consist of 
residues 433-473 , 488-513 , and 542-584 of the N-terminal 
half and 1219-1259, 1277-1302, and 1340-1382 of the C- 
terminal half of CFTR. The heavy overlining points out 
the regions of greatest similarity. Additional general 
homology can be seen even without the introduction of 
gaps. 

Despite the overall symmetry in the structure of the 
protein and the sequence conservation of the NBFs, 
sequence homology between the two halves of the predicted 
CFTR protein is modest. This is demonstrated in Figure 
12, where amino acids 1-1480 are represented on each 
axis. Lines on either side of the identity diagonal 
indicate the positions of inte?-r^l similarities. 
Therefore, while four sets of internal sequence identity 
can be detected as shown in Figure 12, using the Dayhoff 
scoring matrix as applied by Lawrence et al. [C. B. 



protein 'are also of functional importance. Such 
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Lawrence, D. A. Goldman, and R. T. Hood, B 11 ii_ lIfl ia 1 _ BioJ _ 
48, 569 (1986)], three of these are only apparent at low 
threshold settings for standard deviation. The strongest 
identity is between sequences at the carboxyl ends of the 
NBFs. Of the 66 residues aligned 27% are identical and 
another 11% are functionally similar. The overall weak 
internal homology is in contrast to the much higher 
degree (>70%) i„ P-glycoprotein for which a gene 
duplication hypothesis has been proposed (Gros et al 
S&ll 47, 371, 1986, C. Chen et al, CjOI 47, 381, 1986, 
Gerlach et al, KfliilEe., 3 24, 485, 1986, Gros et al, MoJ^ 
P<? 1 1, P1 ^' 8 ' 2770 < 1988 >- The lack of conservation in 
the relative positions of the exon-intron boundaries may 
argue against such a model for CFTR (Figure 2) . 

Since there is apparently no signal-peptide sequence 
at the amino-terminus of CFTR, the highly charged 
hydrophilic segment preceding the first transmembrane 
sequence is probably oriented in the cytoplasm. Each of 
the 2 sets of hydrophobic helices are expected to form 3 
transversing loops across the membrane and little 
sequence of the entire protein is expected to be exposed 
to the exterior surface, except the region between 
transmembrane segment 7 and 8. it is of interest to note 
that the latter region contains two potential sites for 
25 N-linked glycosylation. 

Each of the nembrane-associated regions is followed 
by a NBF as indicated above, in addition, a highly 
charged cytoplasmic domain can be identified in the 
middle of the predicted CFTR polypeptide, linking the 2 
30 halves of the protein. This domain, named the R-domain 
is operationally defined by a single large exon in which 
69 of the 241 amino acids are polar residues arranged in 
alternating clusters of positive and negative charges. 
Moreover, 9 of the 10 consensus sequences required for 
phosphosphorylation by protein kinase A (P-; , and 7 of 
the potential substrate sites for protein kinase c' (PKC) 
found in CFTR are located in this exon. 
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4.2 FUNCTION OF CFTR 

Properties of CFTR can be derived from comparison to 
other membrane-associated proteins (Figure 15) . In 
addition to the overall structural similarity with the 
5 mammalian P-glycoprotein, each of the two predicted 

domains in CFTR also shows remarkable resemblance to the 
single domain structure of hemolysin B of L coli and the 
product of the White gene of Drosophila. These Latter 
proteins are involved in the transport of the lytic 

10 peptide of the hemolysin system and of eye pigment 
molecules, respectively. The vitamin B12 transport 
system of coli , BtuD and MbpX which is a liverwort 
chloroplast gene whose function is unknown also have a 
similar structural motif. Furthermore, the CFTR protein 

15 shares structural similarity with several of the 

periplasmic solute transport systems of gram negative 
bacteria where the transmembrane region and the ATP- 
binding folds are contained in separate proteins which 
function in concert with a third substrate-binding 

2 0 polypeptide . 

The overall structural arrangement of the 
transmembrane domains in CFTR is similar to several 
cation channel proteins and some cation-translocating 
ATPases as well as the recently described adenylate 

25 cyclase of bovine brain. The functional significance of 
this topological classification, consisting of 6 
transmembrane domains, remains speculative. 

Short regions of sequence identity have also been 
detected between the putative transmembrane regions of 

30 CFTR and other membrane-spanning proteins. 

Interestingly, there are also sequences, 18 amino acids 
in length situated approximately 50 residues from the 
carboxyl terminus of CFTR and the raf serine /threonine 
kinase protooncogene of Xenopus laevis which are 

35 identical at 12 of these positions. 

Finally, an amino acid sequence identity (10/13 
conserved residues) has been noted between a hydrophilic 



WCT91/10734 ~ iPCr/CA91/00009 

89 

segment (position 701-713) within the highly charged R- 
domain of CFTR and a region immediately preceding the 
first transmembrane loop of the sodium channels in both 
rat brain and eel. The charged R-domain of CFTR is not 
5 shared with the topologically closely related P- 
glycoprotein; the 241 amino acid linking-peptide is 
apparently the major difference between the two proteins. 

In summary , features of the primary structure of the 
CFTR protein indicate its possession of properties 
10 suitable to participation in the regulation and control 
of ion transport in the epithelial .cells of tissues 
affected in CF. Secure .attachment to the membrane in two 
regions serve to position its three major intracellular 
domains (nucleotide-binding folds 1 and 2 and the R- 
15 domain) near the cytoplasmic surface of the cell membrane 
where they can modulate ion movement through channels 
formed either by CFTR transmembrane segments themselves 
or by other membrane proteins. 

In view of the genetic data, the tissue-specificity, 
20 and the predicted properties of the CFTR protein, it is 
reasonable to conclude that CFTR is directly responsible 
for CF. It, however, remains unclear how CFTR is 
involved in the regulation of ion conductance across the 
apical membrane of epithelial cells. 
25 It is possible that CFTR serves as an ion channel 

itself. As depicted in Figure 13, 10 of the 12 
transmembrane regions contain one or more amino acids 
with charged side chains, a property similar to the brain 
sodium channel and the GABA receptor chloride channel 
30 subunits, where -charged residues are present in 4 of the 
6, and 3 of the 4, respective membrane-associated domains 
per subunit or repeat unit. The amphipathic nature of 
these transmembrane segments i6 believed to contribute to 
the channel-forming capacity of these molecules. 
35 Alternatively, CFTR may not be an ion channel but instead 
serve to regulate ion channel activities. In support of 
the latter assumption, none of the purified polypeptides 
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from trachea and kidney that are capable of 
reconstituting chloride channels in lipid membranes 
[Landry et al, Science 224:1469 (1989)] appear to be CFTR 
if judged on the basis of the molecular mass. 
5 In either case, the presence of ATP-binding domains 

in CFTR suggests that ATP hydrolysis is directly involved 
and required for the transport function. The high 
density of phosphorylation sites for PKA and PKC and the 
clusters of charged residues in the R-domain may both 
10 serve to regulate this activity. The deletion of a 
phenylalanine residue in the NBF may prevent proper 
binding of ATP or the conformational change which this 
normally elicits and consequently result in the observed 
insensitivity to activation by PKA- or PKC-mediated 
phosphorylation of the CF apical chloride conductance 
pathway, since the predicted protein contains several 
domains and belongs to a family of proteins which 
frequently function as parts of multi-component molecular 
systems, CFTR may also participate in epithelial tissue 
functions of activity or regulation not related to ion 
transport. 

With the isolated CF gene (cDNA) now in hand it is 
possible to define the basic biochemical defect in CF and 
to further elucidate the control of ion transport 
pathways in epithelial cells in general. Most important 
knowledge gained thus far from the predicted structure of 
CFTR together with the additional information from 
studies of the protein itself provide a basis for the 
development of improved means of treatment of the 
30 disease. In such studies, antibodies have been raised to 
the CFTR protein as later described. 

Q CF BCREENTMff 

Given the knowledge of the 85, 148, 178, 455/ 493 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092 amino' 
acid position mutations and the nucleotide sequence 
varients at DNA sequence positions 129, 556, 621+1, 
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711+1, 1717-1 and 3659 as disclosed herein, carrier 
screening and prenatal diagnosis can be carried out as 
follows. 

The high risk population for cystic fibrosis is 
Caucasians. For example, each Caucasian woman and/or man 
of child-bearing age would be screened to determine if 
she or he was a carrier (approximately a 5% probability 
for each individual) . if both are carriers, they are a 
couple at risk for a cystic fibrosis child. Each child 
of the at risk couple has a 25% chance of being affected 
with cystic fibrosis. The procedure for determining 
carrier status using the probes disclosed herein is"as 
follows . 

For purposes of brevity, the discussion on screening 
by use of one of the selected mutations is directed to 
the 1507 mutation. It is understood that screening can 
also be accomplished using one of the other mutations or 
using several of the mutations in a screening process or 
nutation detection process of this section on CF 
screening involving DNA diagnosis and mutation detection. 

One major application of the DNA sequence 
information of the normal and 507 mutant CF gene is in 
the area of genetic testing, carrier detection and 
prenatal diagnosis. Individuals carrying mutations in 
the CF gene (disease carrier or patients) may be detected 
at the DNA level with the use of a variety of techniques. 
The genomic DNA used for the diagnosis may be obtained 
from body cells, such as those present in peripheral 
blood, urine, saliva, tissue biopsy, surgical specimen 
and autopsy material. The DNA may be used directly for 
detection of specific sequence or may be amplified 
enzymatically In vitro , by using PCR [Saiki et al. science 
230: 1350-1353, (1985), Saiki et al. Nature 324: 163-166 
(1986)3 prior to analysis. RNA or its cDNA form may also 
be used for the same purpose. Recent reviews of this 
subject have been presented by Caskey, [Science 236: 
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1223-8 (1989) and by Landegren et al (Science 242: 229- 
237 (1989) ] . 

The detection of specific DNA sequence may be 
achieved by methods such as hybridization using specific 
5 oligonucleotides [Wallace et al. Cplfl Spring HflI*>Qur 
s ypp. Quant, Biol. 51: 257-261 (1986)], direct DNA 
sequencing (Church and Gilbert, Proc. Nat. Acadt Sci, U- 
s . a. 81: 1991-1995 (1988)], the use of restriction 
enzymes (Flavell et al. Cell 15: 25 (1978), Geever et al 
10 prnc. Nat. Acad. Sc i. U. S, A. 78: 5081 (1981)], 

discrimination on the basis of electrophoretic mobility 
in gels with denaturing reagent (Myers and Maniatis, £old 
Spring Harbour Svttu Quant. Biol. 51: 275-284 (1986)), 
RNase protection (Myers, R. M. , Larin, J., and T. 
15 Maniatis Science 230: 1242 (1985)), chemical cleavage 
(Cotton et al Proc. Nat. Acad. S ci. U. S. A, 85: 4397- 
4401, (1985)) and the ligase-mediated detection procedure 
[Landegren et al Science 241:1077 (1988)]. 

Oligonucleotides specific to normal or mutant 
20 sequences are chemically synthesized using commercially 
available machines, labelled radioactively with isotopes 
(such as n P) or non-radioactively (with tags such as 
biotin (Ward and Langer et al. Proc. Nat. Acadt Sci, Ut 
S. A, 78: 6633-6657 (1981)), and hybridized to individual 
25 DNA samples immobilized on membranes or other solid 
supports by dot-blot or transfer from gels after 
electrophoresis. The presence or absence of these 
specific sequences are visualized by methods such as 
autoradiography or fluorometric (Landegren et al, 1989, 
30 supra ) or coiorimetric reactions (Gebeyehu et a. flucleic 
Acids Research 15: 4513-4534 (1987)). An embodiment of 
this oligonucleotide screening method has been applied in 
the detection of the 1507 deletion as described herein. 
Sequence differences between normal and mutants may 
3 5 be revealed by the direct DNA sequencing method of Church 
and Gilbert ( supra ) . Cloned DNA segments may be used as 
probes to detect specific DNA segments. The sensitivity 
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rw th0d iG neatly enhanced when combined with P C R 

[Wrichnik et al, ^clejc flrl^J^ 15:529-542 (1987) . 
Wong et al. HaJ^ 330:384-386 (1987); stoflet et al 
^^ 239:491-494 (1988,]. In the latter a 
5 sequencing primer which lies within the amplified 
sequence is used with double-stranded PGR product or 
single-stranded template generated by « Modified PCR. 
The sequence determination is performed by conventional 

10 sir T 8 radlolabel «* nucleotides or by automatic 

10 sequencing procedures with fluorescent-tags 

Sequence alterations may occasionally generate 
fortuitous restriction enzyme recognition sites which are 
revealed by the use of appropriate enzyme digest^ 
followed by conventional gel-blot hybridization 

c!rr^nT^ ^ i Ial ^ l98^S03 (1975,, ' DNA fragments 
TZei I ? (elther n0nBal ° r BUtant) «" 

restrictioTr ** " lnCr ° a8e °< —ending 

restriction fragment numbers. Genomic DNA samples may 

also be amplified by PCR prior to treatment witt the 

*• appropriate restriction enzyme; fragments of different 

of e^iuTr v !r lized under w ught in ^ 

of ethidiua bromide after gel electrophoresis. 

mav b!T^ C t ! Stln9 ba8ed °" ° NA 86qUence ^"erences 
»ay be achieved by detection of alteration in 

insertions can beUluzed ^^S^T" ~ 

l d c rr re : is - For exMpie ' ^ pc * wL the 3 

bp deletion is clearly distinguishable from the normal 
sequence on an 8% non-denaturing polyacrylamide gel. M 
fragments of .different sequence compositions may be 
distinguished on denaturing formamide gradient gel i n 
which the mobilities of different DNA fragments are 
retarded in the gel at different positions according to 
their specific "partial-melting" temperatures (Myers 

^deAV* 11 ' 10 "' fl6gUenCe alte " tio -' ^ Particular 
small deletions, may be detected as changes in the 
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migration pattern of DNA heteroduplexes in non-denaturing 
gel electrophoresis, as have been detected for the 3 bp 
(1507) mutation and . in other experimental systems 
[Nagamine et al, Am. J. Hum. Genet , 45:337-339 (1989)]. 
5 Alternatively , a method of detecting a mutation 

comprising a single base substitution or other small 
change could be based on differential primer length in a 
PCR. For example , one invariant primer could be used in 
addition to a primer specific for a mutation. The PCR 

10 products of the normal and mutant genes can then be 
differentially detected in acrylamide gels. 

Sequence changes at specific locations may also be 
revealed by nuclease protection assays, such as~RNase 
(Myers, supra ) and SI protection (Berk, A. J. , and P. A. 

15 Sharpe Proc. Wat. Acad. Sci. U. S. A. 75: 1274 (1978)), 
the chemical cleavage method (Cotton, supra ) or the 
ligase-mediated detection procedure (Landegren supra 1 . 

In addition to conventional gel-electrophoresis and 
blot-hybridization methods, DNA fragments may also be 

20 visualised by methods where the individual DNA samples 
are not immobilized on membranes* The probe and target 
sequences may be both in solution or the probe sequence 
may be immobilized [SaiXi et al, Proc. Natl. Acad. Sci 
JZSA, 86:6230-6234 (1989) ]. A variety of detection 

25 methods, such as autoradiography involving radioisotopes, 
direct detection of radioactive decay (in the presence or 
absence of scintillant) , spectrophotometry involving 
colorigenic reactions and fluorometry involving 
fluorogenic reactions, may be used to identify specific 

30 individual genotypes. 

Since more than one mutation is anticipated in the 
CF gene such as 1507 and F508, a multiples system is an 
ideal protocol for screening CF carriers and detection of 
specific mutations. For example, a PCR with multiple, 

35 specific oligonucleotide primers and hybridization 

probes, may be used to identify all possible mutations at 
the same time (Chamberlain et al. Nucleic Acids Research 
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16: 1141-1155 (1988) ).: .The procedure may invol/e 

^ immobilized sequence-specific oligonucleotides probes 

(Saiki et al, &ubx&) . 
LtA DBTECTINQ TOR nv « 07 MUTATION 
5 These detection methods may be applied to prenatal 

diagnosis using amniotic fluid cells, chorionic villi 
biopsy or sorting fetal cells from maternal circulation. 
The test for CF carriers in the population may be 
incorporated as an essential component in a broad-scale 

10 genetic testing program for common diseases. 

According to an embodiment of the invention, the 
portion of the DNA segment that is informative for a 
mutation, such as the- nutation according to this 
embodiment, that is, the portion that immediately 

15 surrounds the 1507 deletion, can then be amplified by 

using standard PCR techniques [as reviewed in Landegren, 
Olf, Robert Kaiser, C. Thomas Caskey, and Leroy Hood, DNA 
Diagnostics - Molecular Techniques and Automation, in 
Science 2 42: 229-237 (1988)]. It is contemplated that 

20 the portion of the DNA segment which is used may be a 
single DNA segment or a mixture of different DNA 
segments. A detailed description of this technique now 
follows. 

A specific region of genomic DNA from the person or 
25 fetus is to be screened. Such specific region is defined 
by the oligonucleotide primers C16B 
( 5 ' GTTTTCCTGGATTATGCCTGGCAC3 ' ) and C16D 
( 5 ' GTTGGCATGCTTTGATGACGCTTC3 ' ) or as shown in Figure 18 
by primers l.Oi-5 and 10i-3. The specific regions using 
30 ioi-5 and 10i-3 were amplified by the polymerase chain 
reaction (PCR). 200-400 ng of genomic DNA, from either 
cultured lymphoblasts or peripheral blood samples of CF 
individuals and their parents, were used in each PCR with 
the oligonucleotides primers indicated above. The 
35 oligonucleotides were purified with Oligonucleotide 

Purification Cartridges™ (Applied Biosystems) or NENS0RB~ 
PREP columns (Dupont) with procedures recommended by the 
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suppliers- The primers were annealed at 55 C for 30 sec, 
extended at 72*C for 60 sec (with 2 units of Taq DNA 
polymerase) and denatured at 94*C for 60 sec, for 30 
* cycles with a final cycle of 7 min for extension in a 
5 Perkin-Elmer/Cetus automatic thermocycler with a Step- 
Cycle program (transition setting at 1.5 min). Portions 
of the PCR products were separated by electrophoresis on 
1.4% agarose gels, transferred to Zetabind™; (Biorad) 
membrane according to standard procedures. 

10 The normal and AI507 oligonucleotide probes of 

Figure 19 (10 ng each) are labeled separately with 10 
units of T4 polynucleotide kinase (Pharmacia) in_a 10 til 
reaction containing 50 mM Tris-HCl (pH7.6), 10 mM MgCl 2 , 
0.5 mM dithiothreitol, 10 mM spermidine , 1 mM EDTA and 

15 30-40 fid of 7[ n P] - ATP for 20-30 min at 37°C. The 
unincorporated radionucleotides were removed with a 
Sephadex G-25 column before use. The hybridization 
conditions were as described previously (J.M. Rommens et 
al Am. J. Hum. Genet . 43,645 (1988)) except that the 

20 temperature can be 37 # C. The membranes are washed twice 
at room temperature with SxSSC and twice at 39 °C with 2 x 
SSC (1 x SSC = 150 mM NaCl and 15 mM Na citrate) . 
Autoradiography is performed at room temperature 
overnight. Autoradiographs are developed to show the 

25 hybridization results of genomic DNA with the 2 specific 
oligonucleotide probes. Probe C normal detects the 
normal DNA sequence and Probe C AI507 detects the mutant 
sequence . 

Genomic DNA sample from each family member can, as 
30 explained, -be amplified by the polymerase chain reaction 
using the intron sequences of Figure 18 and the products 
separated by electrophoresis on a 1.4% agarose gel and 
then transferred to Zetabind (Biorad) membrane according 
to standard procedures. The 3bp deletion of AI507 can be 
35 revealed by a very convenient polyacrylamide gel 
electrophoresis procedure. When the PCR products 
generated by the above-mentioned 10i-5 and 10i-3 primers 



SUBSTITUTE SHEET 



#0 91/10734 



PCT/CA9 1/00009 



97 

are applied to an 5% polyacrylamide gel, electrophoresed 
for 3 hrs at 20V/cm in a 90mM Tris-borate buffer (pH 
8,3), DNA fragments of a different mobility are clearly 
detectable for individuals without the 3 bp deletion, 
5 heterozygous or homozygous for the deletion. 

As already explained with respect to Figure 20, the 
PGR amplified genomic DNA can be subjected to gel 
electrophoresis to identify the 3 bp deletion. As shown 
in Figure 20, in the four lanes the first lane is a 

10 control with a norma 1/AF5 08 deletion. The next lane is 
the father with a normal/aI507 deletion. The third lane 
is the mother with a normal/oF508 deletion and the fourth 
lane is the child with a AF508/aI507 deletion. The 
homoduplexes show up as solid bands across the base of 

15 each lane. In lanes 1 and 3, the two heteroduplexes show 
up very clealy as two spaced apart bands. In lane 2, the 
father's AI507 mutation shows up very clearly, whereas in 
the fourth lane, the child with the adjacent 507, 508 
mutations, there is no distinguishable heteroduplexes. 

20 Hence the showing is at the homoduplex line. Since the 
father in lane 2 and the mother in lane 3 show 
heteroduplex banding and the child does not, indicates 
either the child is normal or is a patient. This can be 
futher checked if needed, such as in embryoic analysis by 

25 mixing the 507 and 508 probes to determine the presence 
of the AI507 and AF508 mutations. 

Similar alteration in gel mobility for 
heteroduplexes formed during PCR has also been reported 
for experimental systems where small deletions are 

30 involved (Nagamine et al £UEra) . These mobility shifts 
may be use<l in general as the basis for the non- 
radioactive genetic screening tests. 

£jl2 CP SCREENING PROGRAMS 

It is appreciated that approximately 1% of the 
35 carriers can be detected using the specific AI507 probes 
of this particular embodiment of the invention. Thus, if 
an individual tested is not a carrier using the AI507 
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probes, their carrier status can not be excluded, they 
may carry some other mutation, such as the AF508 as 
previously noted. However, if both the individual and 
the spouse of the individual tested are a carrier for the 
5 AI507 mutation, it can be stated with certainty that they 
are an at risk couple. The sequence of the gene as 
disclosed herein is an essential prerequisite for the 
determination of the other mutations. 

Prenatal diagnosis is a logical extension of carrier 

10 screening. A couple can be identified as at risk for 
having a cystic fibrosis child in one of two ways: if 
they already have a cystic fibrosis child, they are both, 
by definition, obligate carriers of the defective CFTR 
gene, and each subsequent child has a 25% chance of being 

15 affected with cystic fibrosis. A major advantage of the 
present invention eliminates the need for family pedigree 
analysis, whereas, according to this invention, a gene 
mutation screening program as outlined above or other 
similar method can be used to identify a genetic mutation 

20 that leads to a protein with altered function. This is 
not dependent on prior ascertainment of the family 
through an affected child. Fetal DNA samples, for 
example, can be obtained, as previously mentioned, from 
amniotic fluid cells and chorionic villi specimens. 

25 Amplification by standard PCR techniques can then be 
performed on this template DNA. 

If both parents are shown to be carriers with the 
AI507 deletion, the interpretation of the results would 
be the following. If there is hybridization of the fetal 

30 DNA to the normal probe, the fetus will not be affected 
with cystic fibrosis, although it may be a CF carrier 
(50% probability for each fetus of an at risk couple) . If 
the fetal DNA hybridizes only to the AI507 deletion probe 
and not to the normal probe, the fetus will be affected 

35 with cystic fibrosis. 

It is appreciated, that for this and other mutations 
in the CF gene, a range of different specific procedures 
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can be used to provide a complete diagnosis for all 
- potential CF carriers or patients, a complete 

description of these procedures is later described. 

The invention therefore provides a method and kit 
5 for determining if a subject is a CF carrier or CF 

patient. In summary, the screening method comprises the 
steps of: 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member from 
the group consisting of a 507 mutant CF gene, 507 mutant 
CF gene products and mixtures thereof. 

The method may be further characterized by including 
at least one more nucleotide probe which is a different 
DNA sequence fragment of, for example, the DMA of Figure 
1( ofa different DNA sequence fragment of human 
chromosome 7 and located to either side of the DNA 
sequence of Figure l. m this respect, the DNA fragments 
of the intron portions of Figure 2 are useful in further 
confirming the presence of the mutation. Unique aspects 
of the introns at the exon boundaries may be relied upon 
m screening procedures to further confirm the presence 
of the mutation at the 1507 position or othe mutant 
positions. 

A kit, according to an embodiment of the invention, 
suitable for use in the screening technique and for 
assaying for the presence of the mutant CF gene by an 
immunoassay comprises: 

(a) an antibody which specifically binds to a gene 
product of the mutant CF gene having a mutation at one of 
the positions of 85, 148, 178, 455, 493, 507, 542, 549, 
551, 560, 563, 574, 1077 and 1092; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

35 (c) the antibody and reagent means each being 

present in amounts effective to perform the immunoassay. 
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The kit comprises. nycridization techniques. 

z^-r^zr- - - - - 

(*>) reagent means for detecting «-k ^ . 

arrangement of the protein < n ^- 

»pk<» , protein m the cell membrane 

«« Ptoses of to3 e ™ ! atln9 
3. The .tructure-functlon reletiohahips of 

specific antibodies. For exemnl. ... • 

Introduce into ceUs ^iZlTr^l^ 1 ^ '° 

ch„,ea cy topUs»ic xoops "h chX "he"" " 
transaembr.ne sequences ,„ well „ -V? 
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regulatory mechanisms and potentially suggest means 
of modulating the activity of the defective protein 
in a CF patient. 

4. Antibodies with the appropriate avidity also 
enable imaunoprecipitation and immuno-aff inity 
purification of the protein. Immunoprecipitation 
will facilitate characterization of synthesis and 
post translational modification including ATP 
binding and phosphorylation. Purification will be 
required for studies of protein structure and for 
reconstitution of its function, as well as protein 
based . therapy. 

In order to prepare the antibodies, fusion proteins 
containing defined portions of anyone of the mutant CFTR 
polypeptides can be synthesized in bacteria by expression 
of corresponding mutant DMA sequence in a suitable 
cloning vehicle. Smaller peptide may be synthesized 
chemically. The fusion proteins can be purified, for 
example, by affinity chromatography on glutathione- 
agarose and the peptides coupled to a carrier protein 
(hemocyanin) , mixed with Freund's adjuvant and injected 
into rabbits. Following booster injections at bi-weekly 
intervals, the rabbits are bled and sera isolated. The 
developed polyclonal antibodies in the sera may then be 
25 combined with the fusion proteins. Immunoblots are then 
formed by staining with, for example, alkaline- 
Phosphatase conjugated second antibody in accordance with 
the procedure of Blake et al, Anal. Rln^, 136:175 
(1984). 

Thus, it is possible to raise polyclonal antibodies 
specific for both fusion proteins containing portions of 
the mutant CFTR protein and peptides corresponding to 
short segments of its sequence. Similarly, mice can be 
injected with KLH conjugates of peptides to initiate the 
production of monoclonal antibodies to corresponding 
segments of mutant CFTR protein. 
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As for the generation of monoclonal antibodies, 
immunogens for the raising of monoclonal antibodies 
(mAbs) to the mutant CFTR protein are bacterial fusion 
proteins [Smith et al, Gjsne. 67:31 (1988)] containing 
• 5 portions of the CFTR polypeptide or synthetic peptides 
corresponding to short (12 to 25 amino acids in length) 
segments of the mutant sequence. The essential 
methodology is that of Kohler and Milstein lU^XHTS. 256- 
495 (1975)]. 

10 Balb/c mice are immunized by intraperitoneal 

injection with 500 ng of pure fusion protein or synthetic 
peptide in incomplete Freund's adjuvant. A second 
injection is given after 14 days, a third after .21 days 
and a fourth after -28 days. Individual animals so 
15 immunized are sacrificed one, two and four weeks 

following the final injection. Spleens are removed, 
their cells dissociated, collected and fused with Sp2/0- 
Agl4 myeloma cells according to Gefter et al, somat^ 
Cel l Genet Irff 3:231 (1977). The fusion mixture is 
20 distributed in culture medium selective for the 

propagation of fused cells which are grown until they are 
about 25% confluent. At this time, culture supernatants 
are tested for the presence of antibodies reacting with a 
particular CFTR antigen. An alkaline phosphatase 
25 labelled anti-mouse second antibody is then used for 
detection of positives. Cells from positive culture 
wells are then expanded in culture, their supernatants 
collected for further testing and the cells stored deep 
frozen in cryoprotectant-containing medium. To obtain 
30 large quantities of a mAb, producer cells are injected 
into the peritoneum at 5 x 10« cells per animal, and 
ascites fluid is obtained. Purification is by 
chromotography on Protein G- or Protein A-agarose 
according to Ey et al, Immunoch^^-ry 

15:429 (1977). 
35 Reactivity of these mAbs with the mutant CFTR 

protein can be confirmed by polyacrylamide gel 
electrophoresis of membranes isolated from epithelial 
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c>lls in which it is expressed and immunoblotted [Towbin 
et al, Proc. Nat l. Acad. Sci. USA 76:4350 (1979) ] . 

In addition to the use of monoclonal antibodies 
specific for the particular mutant domain of the CFTR 
5 protein to probe their individual functions, other mAbs, 
which can distinguish between the normal and mutant forms 
of CFTR protein, are used to detect the mutant protein in 
epithelial cell samples obtained from patients, such as 
nasal mucosa biopsy "brushings" [ R. De-Lough and J. 

10 Rutland, J. Clin. Pathol. 42, €13 (1989)] or skin biopsy 
specimens containing sweat glands. 

Antibodies capable of this distinction are obtained 
by differentially screening hybridomas from paired sets 
of mice immunized with a peptide containing, for example, 

15 the isoleucine at amino acid position 507 (e.g. 

GTIKENIIFGVSY) or a peptide which is identical except for 
the absence of 1507 (GTIKENIFGVSY) . mAbs capable of 
recognizing the other mutant forms of CFTR protein 
present in patients in addition or instead of 1507 

20 deletion are obtained using similar monoclonal antibody 
production strategies. 

Antibodies to normal and CF versions of CFTR protein 
and of segments thereof are used in diagnostically 
immunocytochemical and immunofluorescence light 

25 microscopy and immunoelectron microscopy to demonstrate 
the tissue, cellular and subcellular distribution of CFTR 
within the organs of CF patients, carriers and non-CF 
individuals. 

Antibodies are used to therapeutically modulate by 
30 promoting the activity of the CFTR protein in CF patients 
and in cells of CF patients. Possible modes of such 
modulation might involve stimulation due to cross-linking 
of CFTR protein molecules with multivalent antibodies in 
analogy with stimulation of some cell surface membrane 
3 5 receptors, such as the insulin receptor [O'Brien et al, 
Euro. Mol. Biol. O roan. J. 6:4003 (1987)], epidermal 
growth factor receptor [Schreiber et al, J. Biol. Cherc. 
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258:846 (1983)] and T-cell receptor-associated molecules 
such as CD4 (Veillette et al Nature , 338:257 (1989)]. 

Antibodies are used to direct the delivery of 
therapeutic agents to the cells which express defective 
5 CFTR protein in CF. For this purpose, the antibodies are 
incorporated into a vehicle such as a liposome [Matthay 
et al, Cflncer Reg. 46:4904 (1986)] which carries the 
therapeutic agent such as a drug or the normal gene. 

10 DNA diagnosis is currently being used to assess 

whether a fetus will be born with cystic fibrosis, but 
historically this has only been done after a particular 
set of parents has already had one cystic fibrosis child 
which identifies them as obligate carriers. However, in 

15 combination with carrier detection as outlined above, DNA 
diagnosis for all pregnancies of carrier couples will be 
possible. If the parents have already had a cystic 
fibrosis child, an extended haplotype analysis can be 
done on the fetus and thus the percentage of false 

20 positive or false negative will be greatly reduced. If 
the parents have not already had an affected child and 
the DNA diagnosis on the fetus is being performed on the 
basis of carrier detection, haplotype analysis can still 
be performed. 

25 Although it has been thought for many years that 

there is a great deal of clinical heterogeneity in the 
cystic fibrosis disease, it is now emerging that there 
are two general categories, called pancreatic sufficiency 
(CF-PS) and pancreatic insufficiency (CF-PI) . if the 

30 mutations related to these disease categories are well 
characterized, one can associate a particular mutation 
with a clinical phenotype of the disease. This allows 
changes in the treatment of each patient. Thus the 
nature of the mutation will to a certain extent predict 

35 the prognosis of the patient and indicate a specific 
treatment. 
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The postulate that CFTR nay regulate the activity of 
xon channels, particularly the outwardly rectifying ci 

5 tes a te d el h iB L liC ! tei " fUnCti ° nal in CF, can be 

5 tested by the injection and translation of full length in 
Yitrs transcribed CFTR mRNA in Xenopus oocytes The 
ensuing changes in ion currents across the oocyte 
membrane can be measured as the potential is eloped at a 
fixed value. CFTR may regulate endogenous oocyte 
10 channels or it may be necessary to also introduce 

epithelial cell rka to direct the translation of channel 
proteins. Use of mRNA coding for normal and for mutant 
CFTR, as provided by. this invention, makes these " 
experiments possible. 
15 other modes of expression in heterologous cell 

system also facilitate dissection of structure-function 
relationships. The complete CFTR DNA sequence ligated 
into a plasmid expression vector is used to transfect 
cells so that its influence on ion transport can be 
20 assessed. Plasmid expression vectors containing part of 
the normal CFTR sequence along with portions of modified 
sequence at selected sites can be used in 
mutagenesis experiments performed in order to identify 
those portions of the CFTR protein which are crucial for 
25 regulatory function. 

The mutant DNA sequence can be manipulated in 
studies to understand the expression of the gene and its 

30 Bnd ' ^ aChiBVe Pr0dUCti0 » ot large quantities 

30 of the protein for functional analysis, antibody 

production, -and patient therapy. The changes in the 
sequence may or may not alter the expression pattern in 
terms of relative quantities, tissue-specificity and 
functional properties. The partial or full-i en gth cDNA 
sequences, which encode for the subject protein 
unmodified or modified, may be ligated to bacterial 
expression vectors such as the pRiT (Nilsson et al. SJBO 
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it 4: 1075-1080 (1985)), pGEX (Smith and Johnson, Sej^ 
67: 31-40 (1988)) or pATH (Spindler et al. J, Virol. 49. 
132-141 (1984)) plasmids which can be introduced into £, 
QQli cells for production of the corresponding proteins 
5 which may be isolated in accordance with the previously 
discussed protein purification procedures. The DNA 
sequence can also be transferred from its existing 
context to other cloning vehicles, such as other 
plasmids, bacteriophages, cosmids, animal virus, yeast 

10 artificial chromosomes (YAC) (Burke et al. Science 236: 
806-812, (1987)), somatic cells, and other simple or 
complex organisms, such as bacteria, fungi (Timber lake 
and Marshall, Science 244s 1313-1317 (1989), 
invertebrates, plants (Gasser and Fraley, Science 244: 

15 1293 (1989), and pigs (Pursel et al. Science 244: 1281- 
1288 (1989)). 

For expression in mammalian cells, the cONA sequence 
may be ligated to heterologous promoters, such as the 
simian virus (SV) 40, promoter in the pSV2 vector 

20 [Mulligan and Berg, Proc. Natl. Acad. Sei ufiA r 78:2072- 
2076 (1981)] and introduced into cells, such as monkey 
COS-1 cells [Gluzman, fieJLl, 23:175-182 (1981)), to 
achieve transient or long-term expression. The stable 
integration of the chimeric gene construct may be 

25 maintained in mammalian cells by biochemical selection, 
such as neomycin (Southern and Berg, J. Mol. Appln. 
genet 1 1:327-341 (1982)] and mycophoenolic acid [Mulligan 
and Berg, fiucra]. 

DNA sequences can be manipulated with standard 

30 procedures such as restriction enzyme digestion, fill-in 
with DNA -polymerase, deletion by exonuclease, extension 
by terminal deoxynucleotide transferase, ligation of 
synthetic or cloned DNA sequences, site-directed 
sequence-alteration via single-stranded bacteriophage 

35 intermediate or with the use of specific oligonu6leotides 
in combination with PCR. 
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The cDNA sequence (or portions derived from it) or 
a mini gene (a cDNA with an intron and its own proper) 
is introduced into eukaryotic expression vectors by 
conventional techniques. These vectors are designed to 
• permit the transcription of the cDNA in eukaryotic cells 
by providing regulatory sequences that initiate and 
enhance the transcription of the cDKA and ensure its 
Proper splicing and polyadenylation. Vectors containing 
the promoter and enhancer regions of the simian virus 
(SV,40 or long terminal repeat (LTR, of the Rous Sarcoma 
virus and polyadenylation and splicing signal from SV 40 

zi r iTZ ?o v : ilabie fMuiiigan et ai p — »t- 

S^i^IZSA 78:1078-2076, (1981,; GorBian ^ teaLkttL 
*^J*i-**79: 6777-6781 (1982,]. AlternativelTte 
CFTR endogenous promoter may be used. The level of 
expression of the cDNA can be manipulated with this type 

aHv^s T" bY USin9 Pr ° BOterS ^ h — different 
activities (for example, the baculovirus p A C373 can 

nZ^r**" hl9h lGVela ^ ********* cells CH. 
D. summers and c. E. Smith in, Genetically Altered 

Viruses and the Environment (B. Fields, et al, eds., vol 

rLVna 9 ^ 8 ' C ° ld Harb ° Ur co- 

spring Harbour, New York, 1985] or by using vectors that 

contain promoters amenable to modulation, for example the 
glucocorticoid-responsive promoter from the mouse mal^ 
tumor virus fLee et al, 294:228 ( ry 

expression of the cDNA can be monitored in the recipient 
cells 24 to 72 hours after introduction (transient 
expression, . 

in addition, some vectors contain selectable markers 
(such as thB („ ulligan et Berg supra] or ^ 

[Southern and Berg J, Ho it A P p 1n CrtWt 1:327-341 (1 982,, 
bacterial genes that permit isolation of cells, by 
chemical selection, that have stable, long term 
expression of the vectors (and therefore the cDNA) in the 
re p len t cell. The vectors can be maintained in the 
cells as episomal, freely replicating entities by using 



i 



. _ , WO 91/10734 ....... - - ._ .„„._.. _^^CA9l/OQ009_ J~\VO<- 

108 

regulatory elements of viruses such as papilloma (Sarver 
et al Mol. CaI ] . i:486 (1981)] or Epstein-Barr 

(Sugden et al Mel. Cell Rinl . s-4in /na^j 
Alternatively, one can also produce cell lines that have 
5 integrated the vector into genomic DNA. Both of these 
types of cell lines produce the gene product on a 
continuous basis. One can also produce cell lines that 
have amplified the number of copies of the vector (and 
therefore of the cDNA as well) to create cell lines that 
10 can produce high levels of the gene product [Alt et al. 
Jt BiPlt ChEHI, 253: 1357 (1978)]. 

The transfer of DNA into euxaryotic, in particular 
human or other mammalian cells is now a conventional 
technique. The vectors are introduced into therecipient 
15 cells as pure DNA (transf ection) by, for example, 

precipitation with calcium phosphate [Graham and vander 
Eb, Virology 52:466 (1973) or strontium phosphate [Brash 
et al MffXt Cell Bjpl, 7:2013 (1987)], electrqporation 
(Neumann et al EflBp J 1:841 (1982)], lipoff ection [Feigner 
20 «t al Proc Natl. Acad. SrH flfift 84:7413 (1987)], DEAE 
dextran (McCuthan et al J. Natl cu ncer Tn«». 
1968)], microinjection (Mueller et al Cell 15:579 1978)], 
protoplast fusion (Schafner, Proc Nuf i . Aca. xnA V <j h 
72:2163] or pellet guns [Klein et al. Nature 327: 70 
25 (1987) ] . Alternatively, the cDNA can be introduced by 

infection with virus vectors. Systems are developed that 
use, for example, retroviruses (Bernstein et al. Genetic 
Engineering 7: 235, (1985)], adenoviruses [Ahmad et al ic 
Yiral 57:267 (1986)] or Herpes virus (Spaete et al Cell 
30 30:295 (1982)]. 

These eukaryotic expression systems can be used for 
many studj.es of the mutant CF gene and the mutant CFTR * 
product, such as at protein positions 85, 148, 178, 455, 
493, 507, 542, 549, 551, 560, 563, 574, 1077 and 1092. 
35 These include, for example: (l) determination that the 
gene is properly expressed and that all post- 
translational modifications necessary for full biological 
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-activity have been properly completed (2) identify 
regulatory elements located in the 5' region of the CF 
gene and their role in the tissue- or temporal-regulation 
of the expression of the CF gene (3) production of large 
5 amounts of the normal protein for isolation and 

purification (4, to use cells expressing the CFTR protein 
as an assay system for antibodies generated against the 
CFTO protein or an assay system to test the effectiveness 
of drugs, (5) study the function of the normal complete 
10 protein, specific portions of the protein, or of 

naturally occurring or artificially produced mutant 
proteins. Naturally occurring mutant proteins exist in 
patients with CF while artificially produced mutant 
protein can be designed by site directed sequence 
IS alterations. These latter studies can probe the function 
of any desired amino acid residue in the protein by 
nutating the nucleotides coding for that amino acid 

Using the above techniques, the expression vectors 
containing the mutant CF gene sequence or fragments 
20 thereof can be introduced into human cells, mammalian 
cells from other species or non-mammalian cells as 
desired The choice of cell is determined by the purpose 
of the treatment. For example, one can use monkey COS 
cells CCluzman, Ssll 23:175 (1981)], that produce high 
levels of the SV40 T antigen and permit the replication 
of vectors containing the SV40 origin of replication, can 
be used to show that the vector can express the protein 
product, since function is not required. Similar 

30 TZT nt C<>Uld * Perf0rned Wlth Chine8e ham8t - -ary 
30 (CHO) or mouse NIH 3T3 fibroblasts or with human 

fibroblasts or lymphoblasts. 

The recombinant cloning vector, according to this 
invention, then comprises the selected DNA of the DNA 
sequences of this invention for expression in a suitable 
host. The DNA is operatives linked in the vector to an 
expression control sequence in the recombinant DNA 
molecule so that normal CFTR polypeptide can be 
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expressed. The expression control sequence may be 
\ selected from the group consisting of sequences that 

R control the expression of genes of prokaryotic or 

eukaryotic cells and their viruses and combinations 
* 5 thereof. The expression control sequence may be 

specifically selected from the group consisting of the 
lac system, the trp system, the tac system, the trc 
system, major operator and promoter regions of phage 
lambda, the control region of fd coat protein, the early 
10 and late promoters of SV40, promoters derived from 

polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha -mating^ factors and combinations thereof, 
15 The host cell/ which may be transfected with the 

vector of this invention, may be selected from the group 
consisting of L PggutiPTOPriflg, Bacillus subtil is . 

BaclllMP stearothe rmophilus or other bacili; other 
bacteria; yeast; fungi; insect; mouse or other animal; 
20 or plant hosts; or human tissue cells. 

It is appreciated that for the mutant DNA sequence 
similar systems are employed to express and produce the 
mutant product, 

1*2 PROTglH FUNCTION CONS IDERATIONS 
25 To study the function of the mutant CFTR protein, it 

is preferable to use epithelial cells as recipients, 
since proper functional expression may require the 
presence of other pathways or gene products that are only 
expressed in such cells. Cells that can be used include, 
30 for example, human epithelial cell lines such as T84 

(ATCC /CRL 248) or PANC-1 (ATCC / CLL 1469), or the T43 
immortalized CF nasal epithelium cell line (Jettan et al, 
Sgjgnce (1989)] and primary [Yanhoskes et al. Ann. Rev, 
RCffPr Pig, 132: 1281 (1985)] or transformed (Scholte et 
35 al* Exp. Cell, Ppa, 182: 559(1989)] human nasal polyp or 
airways cells, pancreatic cells [Harris and Coleman 
Cell- gcj. 87: 695 (1987)], or sweat gland cells [Collie 
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et al. In V i tro 21: 597 (1985)] derived from normal or CF 
subjects. The CF cells can be used to test for the 
functional activity of mutant CF genes. Current 
functional assays available include the study of the 
5 movement of anions (CI or I) across cell membranes as a 
function of stimulation of cells by agents that raise 
intracellular AMP levels and activate chloride channels 
[Stutto et al. Proc. Nat. Acad. s<H . Vl <;, fi , 82 . 6677 
(1985)]. other assays include the measurement of changes 
10 in cellular potentials by patch clamping of whole cells 
or of isolated membranes [Frizzell et al. Science 233: 
558 (1986), Welsch and Liedtke Nature 322: 467 (1986) ]or 
the study of ion fluxes in epithelial sheets of confluent 
cells [Widdicombe et al. Proc. N„f.. * cad . S q< , 82 . 6167 
15 (1985) ] . Alternatively, RNA made from the CF gene could 
be injected into Xenoous oocytes. The oocyte will 
translate RNA into protein and allow its study. As other 
more specific assays are developed these can also be used 
in the study of transfected mutant CFTR protein function. 

"Domain-switching" experiments between mutant CFTR 
and the human multidrug resistance P-glycoprotein can 
also be performed to further the study of the mutant CFTR 
protein. In these experiments, plasmid expression vectors 
are constructed by routine techniques from fragments of 
the mutant CFTR sequence and fragments of the sequence of 
P-glycoprotein ligated together by ONA ligase so that a 
protein containing the respective portions of these two 
proteins will be synthesized by a host cell transfected 
with the plasmid. The latter approach has the advantage 
30 that many experimental parameters associated with 

multidrug resistance can be measured. Hence, it is now 
possible to assess the ability of segments of mutant CFTR 
to influence these parameters. 

These studies of the influence of mutant CFTR on ion 
transport will serve to bring the field of epithelial 
transport into the molecular arena. 
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THFRAPIEP 

It is understood that the major aim of the various 
biochemical studies using the compositions of this 
invention is the development of therapies to circumvent 
5 or overcome the CF defect, using both the pharmacological 
and the "gene-therapy" approaches. 

In the pharmacological approach, drugs which 
circumvent or overcome the CF defect are sought. 
Initially, compounds may be tested essentially at random, 

10 and screening systems are required to discriminate among 
many candidate compounds. This invention provides host 
cell systems, expressing various of the mutant CF genes f 
which are particularly well suited for use as first level 
screening systems.' Preferably, a cell culture system 

15 using mammalian cells (most preferably human cells) 

transfected with an expression vector comprising a DNA 
sequence coding for CFTR protein containing a CF- 
generating mutation, for example the 1507 deletion, is 
used in the screening process. Candidate drugs are 

20 tested by incubating the cells in the presence of the 
candidate drug and measuring those cellular functions 
dependent on CFTR, especially by measuring ion currents 
where the transmembrane potential is clamped at a fixed 
value. To accommodate the large number of assays, 

25 however, more convenient assays are based, for example, 
on the use of ion-sensitive fluorescent dyes. To detect 
changes in Cl 4 on concentration SPQ or its analogues are 
useful. 

Alternatively, a cell-free system could be used. 
30 Purified CFTR could be reconstituted into articif ial 
membranes and drugs could be screened in a cell-free 
assay [Al-Aqwatt, Science , (1989)]. 

At the second level, animal testing is required. It 
is possible to develop a model of CF by interfering with 
35 the normal expression of the counterpart of the CF gene 
in an animal such as the mouse. The "knock-out" of this 
gene by introducing a mutant form of it into the germ 
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line of animals will provide a strain of animals with CF- 
like syndromes. This enables testing of drugs which 
showed a promise in the first level cell-based screen. 
As further knowledge is gained about the nature of 
5 the protein and its function, it will be possible to 
predict structures of proteins or other compounds that 
interact with the CFTR protein. That in turn will allow 
for certain predictions to be made about potential drugs 
that will interact with this protein and have some effect 
10 on the treatment of the patients. Ultimately such drugs 
may be designed and synthesized chemically on the basis 
of structures predicted to be required to interact with 
domains of CFTR. This approach is reviewed in Capsey and 
Delvatte, CeneUcnnv Engineered Human Tho rft p eut1n n n ir Tp 
15 Stockton Press, Kew York, 1988. These potential drugs 
must also be tested in the screening system. 
PROTEIN RRPLAGgMTOT THgRAFY 
Treatment of CF can be performed by replacing the 
defective protein with normal protein, by modulating the 
20 function of the defective protein or by modifying another 
step in the pathway in which CFTR participates in order 
to correct the physiological abnormality. 

To be able to replace the defective protein with the 
normal version, one must have reasonably large amounts of 
25 pure CFTR protein. Pure protein can be obtained as 

described earlier from cultured cell systems. Delivery 
of the protein to the affected airways tissue will 
require its packaging in lipid-containing vesicles that 
facilitate the incorporation of the protein into the cell 
30 membrane, it may also be feasible to use vehicles that 
incorporate proteins such as surfactant protein, such as 
SAP(Val) or SAP(Phe) that performs this function 
naturally, at least for lung alveolar cells. (Per Patent 
Application WO/8803170, Whitsett et al, May 7, 1988 and 
PCT Patent Application W089/04327, Benson et al. May 18, 
1989) . The CFTR-contalning vesicles are introduced into 
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the airways by Inhalation or irrigation, techniques that 
are currently used in CF treatment (Boat et al, supra) 
*.3.2 DRUO THERAPY 

Modulation of CFTR function can be accomplished by 
5 the use of therapeutic agents (drugs). These can be 

identified by random approaches using a screening program 
in which their effectiveness in modulating the defective 
CFTR protein is monitored In vitro . Screening programs 
can use cultured cell systems in which the defective CFTR 

10 protein is expressed* Alternatively, drugs can be 

designed to modulate CFTR activity from knowledge of the 
structure and function correlations of CFTR protein and 
from knowledge of the specific defect in the CFTR mutant 
protein (Capsey and Delvatte, supra ) • It is possible 

15 that the mutant CFTR protein will require a different 

drug for specific modulation. It will then be necessary 
to identify the specific mutation (s) in each CF patient 
before initiating drug therapy. 

Drugs can be designed to interact with different 

20 aspects of CFTR protein structure or function. For 

example, a drug (or antibody) can bind to a structural 
fold of the protein to correct a defective structure. 
Alternatively, a drug might bind to a specific functional 
residue and increase its affinity for a substrate or 

25 cof actor. Since it is known that members of the class 
of proteins to which CFTR has structural homology can 
interact, bind and transport a variety of drugs, it is 
reasonable to expect that drug-related therapies may be 
effective in treatment of CF. 

3 0 A third mechanism for enhancing the activity of an 

effective drug would be to modulate the production or the 
stability of CFTR inside the cell. This increase in the 
amount of CFTR could compensate for its defective 
function. 

3 5 Drug therapy can also be used to compensate for the 

defective CFTR function by interactions with other 
components of the physiological or biochemical pathway 
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necessary for the expression of the CFTR function. These 
interactions can lead to increases or decreases in the 

identic °' thCSe anClllary Pr0tein8 « »» -thods for the 
identification of these drugs would be similar to those 
5 described above for CFTR-related drugs. 
In other genetic disorders It- 
correct fnr been P 08 °ible to 

function h COn8eqUences «f ^tered or missing normal 

ZZ Z 7 ° f BOdi "-tions. This has 

taxen the form of removal of EatahnHf., 

10 of phenylfcetonurl.. vhere phenyl w'! I. *" 

15 IT." ,r Whe " ^ "-"-I correction of 

addition ot the e^yae to the diet. Thus, once the 

tlTel fj*? fUnCti ° n l »" >~* el«cld«ed end the 

basic defect in C F he. „..„ astlned< ^ 

achieve by di.tery aanlpulatlons. 

" cllJ"! """" P ° tent1 ' 1 »PPro.ch i. .„- 

celled -jene-therapy- 1„ „ hlch norMl t 

^ «•*«*«*«"- ln ^ » « to successfully 

a^cted r?"* 1 Pr ° teIn ln key «Pitheli.i cells of 
effected tissues. It is .ost crucial to attest to 
« achieve this with the al™,y epltheliel cells of the 

IeU. inT T* " 9e "° iS deUVe " d t0 «— 

sufflcw . < ^ " — * **- "» — ^ «« 

result ^. Pr °!: ln t0 Pr ° Vlde '-"ion. As . 

result, the patient's quality end length of life „ U 1 h. 

deliver the gene to all affected tissues. 

verfi4 ° ne * PPr ° aCh t0 of CF is to insert a normal 

35 Z \°V 6 CF ^ int ° ^ ^ -Pitheliu.of 
rlsT 6 . Patient8 ' Jt 16 to note that the 

»"taMt 0, T 8y8tea 18 PriBary CaUSe ° f **»iblty and 
Mortality in CF; while pancreatic disease is a major 
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feature, it is relatively well treated today with enzyme 
supplementation. Thus, somatic cell gene therapy (f or a 
review, see T. Friedmann, Science 244:1275 (1989)] 
targeting the airway would alleviate the most severe 
5 problems associated with CF. 

A- Retroviral Vectors . Retroviruses have been 
considered the preferred vector for experiments in 
somatic gene therapy, with a high efficiency of infection 
and stable integration and expression [Orkin et al Proo. 
10 Mefl. Genet 7; 130, (1988)]. A possible drawback is that 
cell division is necessary for retroviral integration, so 
that the targeted cells in the airway may have to be 
nudged into the cell cycle prior to retroviral infection, 
perhaps by chemical means. The full length CF gene cDNa' 
15 can be cloned into a retroviral vector and driven from 

either its endogenous promoter or from the retroviral LRT 
(long terminal repeat) . Expression of levels of the 
normal protein as low as 10% of the endogenous mutant 
protein in CF patients would be expected to be 
20 beneficial, since this is a recessive disease. Delivery 
of the virus could be accomplished by aerosol or 
instillation into the trachea. 

B- Other Viral Vagforfl- other delivery systems 
which can be utilized include adeno-associated virus 
25 (AAV, McLaughlin et al, J. Virol 62:1963 (1988)], 

vaccinia virus [Moss et al Annu. p«v.t™„,™ ? 5 . 305/ 
1987)], bovine papilloma virus (Rasmussen et al, Methods 
Enzyjial 139:642 (1987)] or member of the herpesvirus 
group such as Epstein-Barr virus (Margolskee et al Mol. 
30 Cell, B i ol 8:2937 (1988)]. Though much would need to be 
learned about their basic biology, the idea of using a 
viral vector with natural tropism for the respiratory 
track (e.g. respiratory syncytial virus, echovirus, 
Coxsackie virus, etc.) is possible. 
35 c - Non-viral Gene Tr^npfpr Other methods of 

inserting the CF gene into,- respiratory epithelium may 
also be productive; many of these are lower efficiency 
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and would potentially require infection in vitro , 
selection of transfectants, and reimplantation. This 
would include calcium phosphate, DEAE dextran, 
electroporation, and protoplast fusion. A particularly 
5 attractive idea is the use of liposome, which might be 
possible to carry out in xim [Ostro, LWzaea, Marcel- 
Dekker, 1987]. Synthetic cationic lipids such as DOTMA 
[Felger et al PrPC. Katl ■ Ecart.sM V ? h 84:7413 (1987)] 
may increase the efficiency and ease of carrying out this 
10 approach. 

CF ANIMAL wppifrg 

The creation of a mouse or other animal model for GF 
will be crucial to understanding the disease and for 
testing of possible therapies (for general review of 
creating animal models, see Erickson, Am. J . h.„ r - n fr 
43:582 (1988)). Currently no animal model of the CF 
exists. The evolutionary conservation of the cf gene (as 
demonstrated by the cross-species hybridization blots for 
E4.3 and HI. 6), as is shown in Figure 4, indicate that an 
orthologous gene exists in the mouse (hereafter to be 
denoted mCF, and its corresponding protein as mCFTR) , and 
this will be possible to clone in mouse genomic and cDNA 
libraries using the human CF gene probes, it is expected 
that the generation of a specific mutation in the mouse 
gene analogous to the 1507 mutation will be most optimum 
to reproduce the phenotype, though complete inactivation 
of the mCFTR gene will also be a useful mutant to 
generate . 

A. Mutaqenepls, inactivation of the mCF gene can 
be achieved by chemical [e.g. Johnson et al Proc. Nat- 1 
Acad, scl.-u p fl 78:3138 (1981)] or X-ray mutagenesis [Popp 
et al J, MP l , BloU 127:141 (1979)] of mouse gametes 
followed by fertilization. Offspring heterozygous for 
inactivation of mCFTR can then be identified by Southern 
blotting to demonstrate loss of one allele by dosage, or 
failure to inherit one parental allele if an RFLP marker 
is being assessed. This approach has previously been 
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successfully used to identify mouse mutants for a-globi n 
(Whitney et al Proc. Natl. Acad, sci . iisa 77:1087 
(1980)], phenylalanine hydroxylase [McDonald et al 
Pedjatr, Res 23:63 (1988)], and carbonic anhydrase II 
(Lewis et al Proc. Natl. Acad. SH, o C ., Q * 7 (i 98 8)]. 

B. Transgenics A mutant version of CFTR or mouse 
CFTR can be inserted into the mouse germ line using now 
standard techniques of oocyte injection (Camper, Trends 
In genet i cs (1988)]; alternatively, if it is desirable to 
inactivate or replace the endogenous mCF gene, the 
homologous recombination system using embryonic stem (ES) 
cells [Capecchi, Science 244:1288 (1989)] may be applied. 

1. Oocyte Tnj«»r.t:lfln Placing one or more copies 
of the normal or mutant mCF gene at a random location in 
15 the mouse germline can be accomplished by microinjection 
of the pronucleus of a just-fertilized mouse oocyte, 
followed by reimplantation into a pseudo-pregnant foster 
mother. The liveborn mice can then be screened for 
integrants using analysis of tail DNA for the presence of 
20 human CF gene sequences. The same protocol can be used 
to insert a mutant mCF gene. To generate a mouse model, 
one would want to place this transgene in a mouse 
background where the endogenous mCF gene has been 
inactivated, either by mutagenesis (see above ) or by 
25 homologous recombination (see below) . The transgene can 
be either: a) a complete genomic sequence, though the 
size of this (about 250 kb) would require that it be 
injected as a yeast artificial chromosome or a chromosome 
fragment; b) a cDNA with either the natural promoter or a 
heterologous promoter; c) a "minigene" containing all of 
the coding region and various other elements such as 
introns, promoter, and 3' flanking elements found to be 
necessary for optimum expression. 

2 - Retroviral Infection of Eariv v^pry^ 
This alternative involves inserting the CFTR or mCF gene 
into a retroviral vector and directly infecting mouse 
embroyos at early stages of development generating a 
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chimerc [Soriano, et al 46:19 (1986)]. At least some 

of these will lead to germline transmission, 

3 - ES Cells and Homologous Rgr. o mbiriat ion . The 
embryonic stem cell approach (Capecchi, supra and 
5 Capecchi, Trends CenPt 5:70 (1989)] allows the 
possibility of performing gene transfer and then 
screening the resulting totipotent cells to identify the 
rare homologous recombination events, once identified, 
these can be used to generate chimeras by injection of 
10 mouse blastocysts, and a proportion of the resulting mice 
will show germline transmission from the recombinant 
line. There are several ways this could be useful in the 
generation of a mouse model for CF: - 

a) Inactivation of the mCF gene can be conveniently 
15 accomplished by designing a DNA fragment which contains 

sequences from a mCFTR exon flanking a selectable marker 
such as xi££. Homologous recombination will lead to 
insertion of the n££ sequences in the middle of an exon, 
inactivating mCFTR. The homologous recombination events 

20 (usually about 1 in 1000) can be recognized from the 
heterologous ones by ONA analysis of individual clones 
[usually using PCR, Kim et al Nucleic Adri« Bftff| i 6 :8887 
(1988), Joyner et al Hfltyxfi 338:153 (1989); Zimmer et al 
£]ffir&# p. 150] or by using a negative selection against 

25 the heterologous events [such as the use of an HSV TK 
gene at the end of the construct, followed by the 
ganciclovir selection, Mansour et al. Nature 336:348 
(1988)]. This inactivated mCFTR mouse can then be used 
to introduce a mutant CF gene or mCF gene containing, for 

30 example, the 1507 abnormality or any other desired 
mutation. 

b) It is possible that specific mutants of mCFTR 
cDNA be created in one step. For example, one can make a 
construct containing mCF intron 9 sequences at the 5' 

3 5 end, a selectable n&& gene in the middle, and intro 9 + 
exon 10 (containing the mouse version of the 1507 
mutation) at the 3' end. A homologous recombination 
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event would lead to the insertion of the n<& gene in 
intron 9 and the replacement of exon 10 with the mutant 
version. 

c) If the presence of the selectable n&Z marker i n 
5 the intron altered expresson of the mCF gene, it would be 
possible to excise it in a second homologous 
recombination step. 



10 



d) It is also possible to create mutations in the 
mouse germline by injecting oligonucleotides containing 
the mutation of interest and screening the resulting 
cells by PCR. 

This embodiment of the invention has considered 
primarily a mouse model for cystic fibrosis. Figure 4 
shows cross-species hybridization not only to mouse DNA 
15 but also to bovine, hamster and chicken DNA. Thus, it is 
contemplated that an orthologous gene will exist in many 
other species also. It is thus contemplated that it will 
be possible to generate other animal models using similar 
technology. 

20 Although preferred embodiments of the invention have 

been described herein in detail, it will be understood by 
those skilled in the art that variations may be made 
thereto without departing from the spirit of the 
invention or the scope of the appended claims. 
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CLAIMS: 

1. A DNA molecule comprising an intronless DNA sequence 
encoding a mutant CFTR polypeptide having the sequence 
according to Figure 1 for amino acid residue positions l 
5 to 1480 and, further characterized by nucleotide sequence 
variants resulting in deletion or alteration of amino 
acids of residue positions 85, 148, 178, 455, 493, 507, 
542, 549, 551, 560, 563, 574, 1077 and 1092. 

10 2. A DNA molecule comprising an intronless DNA sequence 
encoding a mutant CFTR polypeptide having the sequence 
according to Figure l for DNA sequence positions l to 
4575 and, further characterized by nucleotide sequence 
variants resulting in deletion or alteration of DNA at 
DNA sequence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659. 



15 



3. A DNA molecule comprising an intronless DNA sequence 
selected from the group consisting of: 
20 (a) DNA sequences which correspond to the selected 

sequence of claim 1 or 2 and which encode, on expression, 
for mutant CFTR polypeptide; 

(b) DNA sequences which correspond to a fragment of 
a selected sequence in claim 1 or 2 including at least 16 

25 nucleotides; 

(c) DNA sequences which comprise at least 16 
nucleotides and encode a fragment of the selected amino 
acid sequence of claim l or 2; and 

(d) DNA sequences encoding an epitope 

3 0 characteristic of the mutant CFTR protein encoded by at 
least 18 sequential nucleotides in the selected sequence 
of claim 1 or 2. 



35 



4. The DNA molecule of claim 1 or 2 wherein the DNA 
molecule is a cDNA. 
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5. The DNA molecule of claim 3 wherein the DNA molecui e 
is a cDNA. 

6. A purified RNA molecule comprising an RNA sequence 
corresponding to the DNA sequence recited in claim 3. 

7. A purified nucleic acid probe comprising a DNA or 
RNA nucleotide sequence corresponding to the selected 
sequence recited in parts (b) , (c) , or (d) of claim 3. 

8. A nucleic acid probe according to claim 7 wherein 
said sequence comprises AAA GAA AAT ATC TTT GGT GTT, and 
its complement. 

15 9. A recombinant cloning vector comprising the DNA 
molecule of claim 3. 



10. The vector of claim 9 wherein said DNA molecule is 
operatively linked to an expression control sequence in 

20 said recombinant DNA molecule so that a mutant CFTR 
polypeptide can be expressed, said mutant CFTR 
polypeptide being selected from the group of CFTR 
polypeptides at mutant positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092, said ' 

25 expression control sequence being selected from the group 
consisting of sequences that control the expression of 
genes of prokaryotic or eukaryotic cells and their 
viruses and combinations thereof. 

30 11. The vector of claim 10 wherein said DNA molecule is 
operatively linked to an expression control sequence in 
said recombinant DNA molecule so that a mutant CFTR 
polypeptide can be expressed, said mutant CFTR 
polypeptide being selected from the group of CFTR 

3S polypeptides at mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659, said expression control 
sequence being selected from the group consisting of 
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sequences that control the expression of genes of 
prokaryotic or eukaryotic cells and their viruses and 
combinations thereof. 



12. 



35 



The vector of claim 10 or li wherein the expression 
control sequence is selected from the group consisting of 
the lac. system, the £te system, the £ac. system, the £rc 
system, major operator and promoter regions of phage 
lambda, the control region of fd coat protein, the early 
and late promoters of SV40, promoters derived from 
polyoma, adenovirus, retrovirus, baculovirus and simian 
virus, the promoter for 3-phosphoglycerate kinase, the 
promoters of yeast acid phosphatase, the promoter of the 
yeast alpha-mating factors and combinations thereof. 



13 



A host transformed with the vector according to 
claim 9. 



14. The host of claim 13 selected from the group 
consisting of strains of £u_££li, Ess^^msn^. ^J^s 
fiUfetilAS, fiflffJUua stearothPT-m^phn^, or other bacili . 
other bacteria; yeast; fungi; insect; mouse or other 
animal; plant hosts; or human tissue cells. 

25 15. The host of claim 14 wherein said human tissue cells 
are human epithelial cells. 

16. A method for producing a mutant CFTR polypeptide 
comprising. the steps of: 

(a) culturing a host cell transfected by the vector 
of claim 8 "in a medium and under conditions favorable for 
expression of the mutant CFTR polypeptide selected from 
the group having mutant positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092;; 

(b) isolating the expressed mutant CFTR 
polypeptide. 



PCT/GA9 1/00009 



■5 



10 



15 



124 

17. A method for producing a mutant CFTR polypeptide 
comprising the steps of; 

(a) culturing a host cell transfected by the vector 
of claim 8 in a medium and under conditions favorable f Qr 
expression of the mutant CFTR polypeptide selected from 
the group having mutant DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659; 

(b) isolating the expressed mutant CFTR 
polypeptide. 

18. A mutant CFTR polypeptide substantially free of 
other human proteins and encoded by the DNA sequence 
recited in claim 3. 

19. A substantially pure mutant CFTR polypeptide 
according to claim 18 made by chemical or enzymatic 
peptide synthesis. 

20. A polypeptide coded for by expression of a DNA 
20 sequence recited in claim 3. 

21. A method for screening a subject to determine if 
said subject is a CF carrier or a CF patient comprising 
the steps of: 

providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a member from 
the group consisting of a mutant CF gene, a mutant CFTR 
polypeptide products and mixtures thereof, the mutants 
being defined by mutations at protein positions 85, 148, 
178, 455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 ' 
and 1092. 



25 



30 



35 



22. A method for screening a subject to determine if 
said subject is a CF carrier or a CF patient comprising 
the steps of: 
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providing a biological sample of the subject to be 
screened; and providing an assay for detecting in the 
biological sample, the presence of at least a meidber from 
the group consisting of a mutant CF gene, a mutant CFTR 
5 polypeptide products and mixtures thereof, the mutants 
being defined by mutations at DNA sequence positions 129, 
556, 621+1, 711+1, 1717-1 and 3659. 

23. The method of claim 21 or 22 wherein the biological 
10 sample includes at least part of the genome of the 

subject and the assay comprises an hybridization assay. 

24. The method of claim 23 wherein the assay further 
comprises a labelled nucleotide probe according to claim 

15 7. 

25. The method of claim 24 wherein said probe comprises 
the nucleotide sequence of claim 8. 

20 26. The method of claim 21 or 22 wherein the biological 
sample includes a CFTR polypeptide of the subject and the 
assay comprises an immunological assay. 

27. The method of claim 26 wherein the assay further 
25 includes an antibody specific for said mutant CFTR 

polypeptide. 

28. The method of claim 26 wherein the assay is a 
radioimmunoassay • 

30 

29. The method of claim 27 wherein the antibody is at 
least one monoclonal antibody. 

30. The method of claim 21 or 22 wherein the subject is 
35 a human fetus in utero . 
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31. The method of claim 24 wherein the assay further 
includes at least one additional nucleotide probe 
according to claim 7 . 

5 32. The method of claim 31, wherein the assay further 
includes a second nucleotide probe comprising a different 
DNA sequence fragment of the DNA of Figure 1 or its RNA 
homologue or a different DNA sequence fragment of human 
chromosome 7 and located to either side of the DNA 
10 sequence of Figure 1. 



20 



33. In a process for screening a potential CF carrier or 
patient to indicate the presence of an identified cystic 
fibrosis mutation in the CF gene, said process including 
15 the steps of: 

(a) isolating genomic DNA from said potential CF 
carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning a mutation in said 
CF gene wherein said DNA probe is capable of detecting 
said mutation, said mutation being selected from the 
group of mutations at protein positions 85, 148, 178, 
455, 493, 507, 542, 549, 551, 560, 563, 574, 1077 and 
1092; 

25 (c) treating said genomic DNA to determine presence 

or absence of said DNA probe and thereby indicating in 
accordance with a predetermined manner of hybridization, 
the presence or absence of said cystic fibrosis mutation. 

30 34. In a process for screening a potential CF carrier or 
patient to indicate the presence of an identified cystic 
fibrosis mutation in the CF gene, said process including 
the steps of: 

(a) isolating genomic DNA from said potential CF 
35 carrier or said potential patient; 

(b) hybridizing a DNA probe onto said isolated 
genomic DNA, said DNA probe spanning a mutation in said 
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CF gene wherein said DNA probe is capable of detecting 
said mutation, said mutation being selected from the 
group of mutations at DNA sequence positions 129, 556, 
621+1, 711+1, 1717-1 and 3659. 

5 

35. A process for detecting cystic fibrosis carriers of 
a mutant CF gene wherein said process consists of 
determining differential mobility of heteroduplex PCR 
products in polyacrylamide gels as a result of deletions 

10 or alterations in the mutant CF gene at one or more of 
the protein positions 85, 148, 178, 455, 493, 507, 542, 
549, 551, 560, 563, 574, 1077 and 1092. 

36. A process for detecting cystic fibrosis carriers of 
15 a mutant CF gene wherein said process consists of 

determining differential mobility of heteroduplex PCR 
products in polyacrylamide gels as a result of deletions 
or alterations in the mutant CF gene at one or more of 
the DNA sequence positions 129, 556, 621+1, 711+1, 1717-1 
20 and 3659. 

37., A kit for assaying for the presence of a mutant CF 
gene by immunoassay comprising: 

(a) an antibody which specifically binds to a gene 
25 product of a mutant CF gene having a mutation at a 

protein position selected from the group consisting of 
protein positions 85, 148, 178, 455, 493, 507, 54 2, 54 9, 
551, 560, 563, 574, 1077 and 1092; 

(b) reagent means for detecting the binding of the 
30 antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

38. A kit for assaying for the presence of a mutant CF 
35 gene by immunoassay comprising: 

(a) 'an antibody which specifically binds to a gene 
product of a mutant CF gene having a mutation at a DNA 



PCT/CA9 1/00009 



10 



128 

sequence position selected from the group consisting of 
DNA seqence positions 129, 556, 621+1, 711+1, 1717-1 and 
3659; 

(b) reagent means for detecting the binding of the 
antibody to the gene product; and 

(c) the antibody and reagent means each being 
present in amounts effective to perform the immunoassay. 

39. The kit of claim 37 or 38 wherein said reagent means 
for detecting binding is selected from the group 
consisting of fluorescence detection, radioactive decay 
detection, enzyme activity detection or colorimetric 
detection. 

15 40. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

(a) an oligonucleotide probe which specifically 
binds to a mutant CF gene; 

(b) reagent means for detecting the hybridization 
20 of the oligonucleotide probe to a mutant CF gene having a 

mutation at a protein position selected from the group 
consisting of protein positions 85, 148, 178, 455, 493, 
507, 542, 549, 551, 560, 563, 574, 1077 and 1092; and ' 

(c) the probe and reagent means each being present 
in amounts effective to perform the hybridization assay. 



25 



41. A kit for assaying for the presence of a CF gene by 
hybridization comprising: 

(a) an oligonucleotide probe which specifically 
30 binds to a mutant CF gene; 

(b) reagent means for detecting the hybridization 
of the oligonucleotide probe to a mutant CF gene having a 
mutation at a DNA sequence position selected from the 
group consisting of DNA sequence positions 129, 556, 

35 621+1, 711+1, 1717-1 and 3659; and 

(c) the probe and reagent means each being present 
in amounts effective to perform the hybridization assay. 



129 

42. An animal comprising a heterologous cell system 
comprising a recombinant cloning vector of claim 9 which 
induces cystic fibrosis, symptoms in said animal. 

43. The animal of claim 42 wherein said animal is a 
mammal. 



44. The animal of claim 43 wherein said mammal is a 
rodent. 

10 

45. The animal of claim 44 wherein said rodent is a 
mouse . 



46. In a polymerase chain reaction to amplify a selected 
15 exon of a cDNA sequence of Figure 1, the use of 

oligonucleotide primers from intron portions near the 5' 
and 3' boundaries of the selected exon of Figure 18. 



20 



47. m a polymerase chain reaction of claim 46, the use 
of oligonucleotide primers xi-5 and xi-3 of Table 5 where 
X is the exon number 1, 3, 4, 5, 6a, 6b, 7 through 13, 
14a, 14b, 15 and 16, 17a, 17b and 18 through 24. 
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V AATTGGAAGCAAATGACATCACACCAGGTCACAGAAAAAGGGTTGAGCGGCAGGCACCCA 



61 GAGTAGTAGGTCTTTGGCATTAGGAGCTTGAGCCCAGACGGCCCTAGCAGGGACCCCAGC 

MQRSPL EKA SVVSKLF 16 
1 2 1 GCCCGAGAGACCATGCAGAGGTCGCCTCTGGAAAAGGCCAGCGTTGTCTCCAAACTTTTT 

F 4 W T R P I LRKGYRQRLELSD 36 
1 8 1 TTCAQCTGGACCAGACCAATTTTGAGGAAAGGATACAGACAGCGCCTGGAATTGTCAGAC 

lYQIPSVDSADNLSEKLEri-E 56 
241 ATATACC AAATCCCTTCTGTTGATTCTGCTGACAATCTATCTG AAAAATTGGAAAOAGAA 

WDRELASKKNPKLI N A L R R C 76 

3 0 1 TGGGATAGAGAGCTGGCTTCAAAGAAAAATCCTAAACTCATTAATGCCCTTCGGCGATGT 

F F W R l F M F Y G T F f. Y L G I E V T K A 1 J96 
361 TTTTTCTGGAGATTTATGTTCTATGGAATCTTTTTATATTTAGGGpAAGTCACCAAAGCA 

I V O P L L L I G R I IASYDP DNKEE 116 

4 2 1 GTACAGCCTCTCTTACTGGGAAGAATCATAGCTTCCTATGACCCGGATAACAAGGAGGAA 

R lSTA TYLGTGLCI, 1, F I V R T D 136 

4 8 1 cgctctatcgcgatttatctaggcataggcttatgccttctctttattgtgaggacactg 

| i l i hpai fglhhigmqmr1am 156 
541 ctcctac accc agcc atttttggccttcatcacattggaatgc ag atgagaatagct atg 

f s l i y k k |t lklssrv ldkis 176 
601 tttagtttgatttataagaagKctttaaagctgtcaagccgtgttctagataaaataagt 

igqlvsllsnnlnkfde i g l l a i 196 
661 attggacaacttgttagtctccttrccaacaacctgaacaaatttgatgaaugacttgca 

1 1, A H F V W I AP LOVALLMGL il W 216 
7 2 1 TTGGCACATTTCGTGTGGATCGCTCCTTTGCAAGTGGCACTCCTCATGGGGCTAATCTGG 

ELL Q iASAFCGLGFL 1 VLALFl 236 

7 8 1 GAGTTGTTACAGGCGTCTGCCTTCTGTGGACTTGGTTTCCTGATAGTCCTTGCCCTTTTT 

I O A G L G I RMMMKYRDQRAGK I S 256 

8 4 1 C AGGCTGGGCTAGGGAG AATG ATGATG AAGTACAGAG ATCAGAGAGCTGGG AAGATC AGT 

ERLV ITSEMIENIQSVKAYC 276 

9 0 1 G AAAGACTTGTGATTACCTCAGAAATGATTG AAAATATCCAATCTGTTAAGGCATACTGC 

WEEAMEKHIENLRdTELKLT 296 
961 TGGGAAGAAGCAATGGAAAAAATGATTGAAAACTTAAGACWACAGAACTGAAACTGACT 

RKAAYVRYFN s )SAFFFSGFf! 316 

1 02 1 cggaaggcagcctatgtgagatacttcaatacx:tcagccttcttcttctcagggttcttt 

IvVFLSVLPYAlTI M I I L R K J 336 

1081 gtggtgtttttatctgtgcttccctatgcactaatcaaagg aatcatcctccggaaaata 

iFTTlSFCTV.LRMAVl T R Q F P W 356 
1141 TTCACCACCATCTCATTCTGCA1 TGTTCTGCGCATGGCGGTCACTCGGCAATTTCCCTGG 

AVQTWYDSLGAINKI QlDFLQ 376 
1201 GCTGTACAAACATGGTATG ACTCTCTTGGAGCAATAAACAAAATACAGbATTTCTTACAA 

KQEYKTLEYNLTTTEVVMEN 396 
12 61 AAGCAAG AATATAAGACATTGGAATATAACTTAACGACTAC AG AAGTAGTGATGGAGAAT 

VTAFWEElGFGELFEKAKQNN 416 
1321 GT AAC AGCCTTCTGGG AGG AG PG ATTTGGGG AATT ATTTGAGAAAGC AAAAC AAAAC AAT 
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FIG. 1 (cont'd) 

o 

NNRKTSNGDDSLFFSNrsLL 
AACAATAGAAAAACTTCTAATGGTGATGACAGCCTCTTCTTCAGTAATTTCTCACTTCTT 
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GTPVLKD IwrKIERGQLl, X V 456 
GGTACTCCTGTCCTGAAAGATATTAATTTCAAGATAGAAAGAGGACAGTTGTTGGCGGTT 



H G E L E 47 6 



GCTGGATCCACTGGAGCAGGCAAbACTTCACTTCTAATGATGATTATGGGAGAACTGGAG 

— E § 1 — 2 KlK HSGRlsrrg o r S H 4 96 

CCTTCAGAGGGTAAAATTAAGCACAGTGGAAGAATTTCATTCTCTTCTCAGTTTTCCTGG 

IMPGTIK EKIIrcv^v D E Y R 516 
ATTATGCCTGGCACCATTAAAGAAAATATCA TCTTTGGTGTTTCCTATGATGAATATAGA 

TACAGAAGCGTCATCAAAGCATGCCAACTAGAAGA<i;ACATCTCX:AAGTrTGCAGAGA^ 
GACAATATAGTTCTTG^ " 6 



576 



TCTTTAGCAAGhGCAGTATACAAAGATGCTGATTTGTATTTATTAGACTCTCCTTTTGGA 

YLDVLTEK *IFE<*CVCKLMA 596 
TACCTAGATGTTTTAACAGAAAAAGAAATATTTGAAAQTTGTGTCTGTAAACTGATGGCT 

KKTRILVTSKMEHLKKAD KI 616 
AACAAAACTAGGATTTTGGTCACTTCTAAAATGGAACATTTAAAGAAAGCTGACAAAATA 

LILHEGSSYFYGTFSELONL 636 
TTAATTTTGCATGAAGGTAGCAGCTATTTTTATGGGACATTTTCAGAACTCCAAAATCTA 

QPDFSSK LMGCDSFOQ FSAE 656 
CAGCCAGACTITAGCTCAAAACTCATGGGATGTGATTCTTTCGACCAATTTAGTGCAGAA 

RRNSI LTETLHRFSLEGDAP 676 
AGAAGAAATTCAATCCTAACTGAGACCTTACACCGTTTCTCATTAGAAGGAGATGCTCCT 

VS WTETKKQS°FKQTGE FGEK 696 
GTCTCCTGGACAGAAACAAAAAAACAATCTTTTAAACAGACTGGAGAGTTTGGGGAAAAA 

RKNSI LNPINSIRKFS IVQK 716 
AGGAAGAATTCTATTCTCAATCCAATCAACTCTATACGAAAATTTTCCATTGTGCAAAAG 

TP LQHNG IEEDSDEP LE RRL 736 
VACTCCCTTACAAATGAATGGCATCGAAGAGGATTCTGATGAGCCTTTAGAGAGAAGGCTG 
c 

SLVP-DSE0GEAILPR1SVIS 756 
TCCTTAGTACCAGATTCTGAGCAGGGAGAGGCGATACTGCCTCGCATCAGCGTGATCAGC 

TGPTLQARRRQSVLNLMTHS 776 
ACTGGCCCCACGCTTCAGGCACGAAGGAGGCAGTCTGTCCTGAACCTGATGACACACTCA 

VNQGQN I HRKTTASTRKVSL 796 
GTTAACCAAGGTCAGAACATTCACCGAAAGACAaCAGCATCCACACGAAAAGTGTCACTG 

APQANLTELDIYSRRLSOET 816 
GCCCCTCAGGCAAACTTGACTGAACTGGATATATATTCAAGAAGGTTATCTCAAGAAACT 
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GLEISEEINEEDLKJeCFFDD 836 
2 581 GGCTTCC WTAAGTGAAGAAATTAACGAAGAAGACTTAAA<^AGTGCCTTTTTGATGAT 

MESIPAVTTWNTYLRYITVH 856 
2 641 ATGGAGAGCATACCAGCAGTGACTACATGGAAC ACATACCTTCGATATATTACTGTCCAC 

K S L l T F V !, T W C T- V I F I. A E V A Al 876 
2 7 01 AAGAGCTTAATTTTTGTGCTAATTTGGTGCTTAGTAATTTTTCTGGCAGAGGTGGCTGCT 

. , *v 

I <; I V vl LWLlrGKjTP LQDKGNST 896 
2 7 61 TCTTTGGTTGTGCTGTGGCTCCTTGGAAAfcACTCCTCTTCAAG ACAAAGGGAATAGTACT 

hsr-Tnsyaviitst sis y y v rl 

2 821 CATAGTAGAAATAACAGCTATGCAGTGATTATC ACCAGCACCAGTTCGTATTATGTGTTT 

tYTYVGVAnTT.L AMG F~71 R G L P 936 
2 881 TACATTTACGTGGGAGTAGCCGACACTTTGCTTGCTATGGGATTCTTCAGAGGTCTACCA 

L V H T L I T V S K I LHHKMLHS V- 956 
2 941 CTGGTGCATACTCTAATCACAGTGTGGAAAATTTTACACCACAAAATGTTACATTCTGTT 

LQAPHSTLNTLKAfeGlLNRF 976 
3001 CTTCAAGCACCTATGTCAACCCTC^CACGTTGAAAGCAGtTGGGATTCTTAATAGA 

SKOIAI LODLLP L T I I F D F I Q I 996 
3061 TCCAAAGATATAGCAATTTTGGATGACCTTCTGCCTCTTACCATATTTGACTTCATCCAO 

ILLL1VIGA1AVVA V l] Q P i Y 1 f1 1016 
3121 TTGTTATTAATTGTGATTGGAGCTATAGCAGTTGTCGCAGTTTTACAACCCTACATCTTT 

IVATVPVTVArT HT.RAYFLl Q T 1036 
3181 GTTGCAACAGTGCCAGTGATAGTGGCTTTTATTATGTTGAGAGCATATTTCCTCCAAACC 

SOQLKQLESEGRSP IFTHLV 1056 
3241 TCTlCAGCMCTCAAACAACTGGAATCTGAAGGCAGGAGTCCAATTTTCACTCATCTTGTT 

TSLKG LWTLRA F G R Q P V F E T 1076 
3301 ACAAGCTTAAAAGGACTATGGACACTTCGTGCCTTCGGACGGCAGCCTTACTTTGAAACT 

LFHKALNLHTAN WFLYLSTL 1096 

33 61 CTGTTCCACAAAGCTCTG AATTTACATACTGCCAACTGGTTCTTGTACCTGTCAACACTG 

R W F 0 M RlIEHlFV'IFFIAVTFl 1116 

34 21 CGCTGGTTCCAAATGAGAATAGAAATGATTTTTGTCATCTTCTTCATTGCTGTTACCTTC 

I T S T L T T |Gl E G E G F ( V G I T L T L~a1 1136 
34 81 ATTTCCATTTTAACAACAC|GAGAAGGAGAAGGAAGAGTTGGTATTATCCTGACTTTAGCC 

IhNTM STLOWAVNSSI I 0 V D S L| 1156 
3541 ATGAATATCATGAGTACATTGCAGTGGGCTGTAAACTCCAGCATAGATGTGG ATAGCTTC 

MRSVSRVFKFI DM PT EGKPT 1176 
3601 ATGCGATCTGTGAGCCGAGTCTTTAAGTTC ATTG ACATGCCAAC AGAAGGTAAACCTACC 

KSTKP YKNGQLSKVMI I EN S 1196 
36 61 AAGTCAACCAAACCATAC AAGAATGGCC AACTCTCG AAAGTTATGATTATTG AG AATTC A 

HVKKDDIWP SGGQMTVKDLT 1216 
3721 CACGTGAAGAAAGATGACATCTGGCCCTCAGGGGGCCAAATGACTGTCAAAGATCTCACA 

A K YTEGGKA1LEHI3TSISF 1236 
3781 GC AAAATACACAGAAGGTGGAAATGCCATATTAG AGAACATTTCCTTCTCAATAAGTCCT 

GQRIvgLLGRTC SGKSTLLS A 1256 
3841 GGCCAGAGGCTGGGCCTCTTGGGAAGAACTGGATCAGGGAAGAGTACTTTGTTATCAGCT 



SUBSTITUTE SHEET. 



WO 91/10734 



PCT/CA9 1/00009 



414b 
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FLR LLNTEGEI QIDGVS WPS 127 6 
3901 TTTTTGAGACTACTGAACACTGAAGGAGAAATCCAGATCGATGGTGTGTCTTGGGATTCA 

I TLQOWRKAFGVIPQKVFir 129 6 

39 61 ATAACTTTGCAACAGTGGAGGAAAGCCTTTGG AGTGATACCACAGAAAGTATTTATTTTT 



4021 TCTGGAAC ATTT AG AAAAAACTTGGATCCCTATG AACAGTGGAGTGATCAAG AAATATGG 

KVADEjVGLRS VIEQFPGKLD 133 6 
4081 AAAGTTGCAGATGAGtTTGGGCTCAGATCTGTGATAGAACAGTTTCCTGGGAAGCTTGAC 

rVLVDCCCVLSHGHKQLMCL 1356 
4141 TTTGTCCTTGTGGATGGGGGCTGTGTCCTAAGCCATGGCCACAAGCAGTTGATGTGCTTG 

ARSVLSKAKILLL DEPSAHL 1376 
4201 GCTAGATCTGTTCTCAGTAAGGCGAAGATCTTGCTGCTTGATGAACCCAGTGCTCATTTG" 

PPVjTVOI I R R TLKQAFADCT 1396 



42 61 GATCCAGTKACATACCAAATAATTAGAAGAACTCTAAAACAAGCATTTGCTGATTGCACA 

V I LCEHR1 EAMLECQQF L I V I 1416 

4321 GTAATTCTCrGTGAACACAGGATAGAAGCAATGCTGGAATGCCAACAATTTTTd3TCATA 

EENKVRQYOSI OKLLNERS L 1436 

4381 GAAGAGAACAAAGTGCGGCAGT ACGATTCCATCC AG AAACTGCTGAACGAGACGAGCCTC 

FRQAI SPSDRVKLFPHRNS S 1456 

44 41 TTCCGGCAAGCCATCAGCCX^CTCCGACAGGGTGAAGCTCTTTCCCCACCGGA-ACTCAAGC 

KCKSKPQ1AALKEETEEEVQ 1476 

4501 AAGTGC AAGTCT AAGCCCCAGATTGCTGCTCTGAAAGAGGAGACAG AAGAAG AGGTGC AA 

D T R L - 1 480 

4561 GATACAAGGCTTTAGAGAGCAGCATAAATGTTGACATGGGACATTTGCTCATGGAATTGG 

4 621 AGCTCGTGGGACAGTCACCTCATGGAATTGGAGCTCGTGGAACAGTTACCTCTGCCTCAG 

4 681 AAAACAAGGATGAATTAAGTTTTTTTTTAAAAAAGAAACATTTGGTAAGGGGAATTGAGG 

4741 ACACTGATATGGGTCTTGATAAATGGCTTCCTGGCAATAGTCAAATTGTGTGAAAGGTAC 

4801 TTCAAATCCTTGAAGATTTACCACTTGTGTTTTGCAAGCCAGATTTTCCTGAAAACrcTT 

48 61 GCCATGTGCTAGTAATTGGAAAGGCAGCTCTAAATGTCAATCAGCCTAGTTG ATCAGCTT 

4 921 ATTGTCTAGTGAAACTCGTTAATTTGTAGTGTTGGAGAAGAACTGAAATCATACTTCTTA 

4 981 GGGTTATGATTAAGTAATGATAACTGGAAACTTCAGCGGTTTATATAAGCTTGTATTCCT 
5041 TTTTCTCTCCTCTCCCCATGATGTTTAGAAACACAACTATATTGTTTGCTAAGCATTCCA 
5101 ACTATCTCATTTCCAAGCAAGTATTAGAAT ACCACAGGAACCACAAGACTGCACATCAAA 
5161 AT ATGCCCCATTCAACATCTAGTG AGCAGTCAGGAAAGAGAACTTCCAGATCCTGG AAAT 
S2 2 1 CAGGGTTAGTATTGTCCAGGTCTACCAAAAATCTCAATATTTCAGATAATCACAATACAT 
52 81 CCCTTACCTGGGAAAGGGCTGTTATAATCTTTCACAGGGGACAGGATGGTTCCCTTGATG 
5341 AAGAAGTTGATATGCCTTTTCCCAACTCCAG AAAGTGACAAGCTCAC AGACCTTTG AACT 
54 01 AG AGTTTAGCTGGAAAAGTATGTTAGTGCAAATTGTCACAGG AC AGCCCTTCTTTCCACA 
54 61 GAAGCTCCAGGTAGAGGGTGTGTAAGTAGATAGGCCATGGGCACTGTGGGTAGACACACA 
5521 TG AAGTCC AAGCATTTAGATGT ATAGGTTG ATGGTGGTATGTTTTC AGGCTAGATGTATG 

5 S 8 1 TACTTC ATGCTGTCTACACTAAGAGAGAATGAGAGACACACTGAAG AAGCACCAATCATG 
5641 MTTAGTTTTATATGCTTCTGTTTTATAATTTTGTGAAGCAAAATTTTTTCTCTAGGAAA 
57 01 TATTTATTTTAATAATGTTTCAAACATATATTACAATGCTGTATTTTAAAAG AATGATTA 
57 61 TGAATTACATTTGTATAAAATAATTTTTATATTTGAAATATTGACTTTTTATGGCACTAG 
5821 TATTTTTATG AAATATTATGTTAAAACTGGGACAGGGG AG AACCTAGGGTGATATT AACC 
5881 AGGGGCCATG AATC ACCTTTTGGTCTGG AGGG AAGCCTTGGGGCTGATCG AGTTGTTGCC 
5941 CACAGCTGTATGATTCCCAGCCAGACACAGCCTCTTAGATGCAGTTCTGAAGAAGATGGT 
6001 ACCACCAGTCTGACTGTTTCCATCAAGGGTACACTGCCTTCTCAACTCCAAACTGACTCT 
€ 0 6 1 TAAG AAG ACTGC ATTATATTTATTACTGTA AG AAAATATCACTTGTCAAT AAAATCCATA 
6121 CATTTGTGT <A) n 
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